Skip to main content

Relay Updates for Sync v1.1

· 5 min read

We have an updated relay implementation which incorporates Sync v1.1 protocol changes, and are preparing to switch over the bsky.network relay operated by Bluesky. This post describes new infrastructure developers can start using today and describes our plan for completing the migration.

Updated Relay Implementation

The new version of the relay is available now in the indigo go git repository, named simply relay. It is a fork and refactor of the bigsky relay implementation, and several components have stuck around. The biggest change is that the relay no longer mirrors full repository data for all accounts in the network: sync v1.1 relays are now "non-archival". We have actually been operating bsky.network with a patched non-archival version of bigsky for some time, but this update includes a number of other changes:

  • the new #sync message type is supported
  • the deprecated #handle, #migration, and #tombstone message types are fully removed (they were replaced with #identity and #account)
  • the sync v1.1 changes to the #commit message schema are supported
  • message validation and data limits updated to align with written specification
  • account hosting status and lifecycle updated to match written specification
  • #commit messages validated with MST inversion (controlled by "lenient mode" flag)
  • new com.atproto.sync.listHosts endpoint to enumerate subscribed PDS instances
  • com.atproto.sync.getRepo endpoint implemented as HTTP redirect to PDS instance
  • simplified configuration, operation, and SQL database schemas

The relay can aggregate firehose messages for the full atproto network on a relatively small and inexpensive server, even with signature validation and MST inversion of every message on the firehose. The relay can handle thousands of messages per second using on the order of 2 vCPU cores, 12 GByte of RAM, and 30 Mbps of inbound and outbound bandwidth. Disk storage depends on the length of the configurable "backfill window." (A 24-hour backfill currently requires a couple hundred GByte of disk.)

Staged Rollout

The end goal is to upgrade the bsky.network relay instance and for all active PDS hosts in the network to emit valid Sync v1.1 messages. We are rolling this out in stages to ensure that no active services in the network break.

We have two new production relay instances:

  • relay1.us-west.bsky.network
  • relay1.us-east.bsky.network

These both run the new relay implementation and attempt to crawl the entire atproto network. The two instances are operationally isolated and subscribe to PDS instances separately, though configuration will be synchronized between them periodically. They have sequence numbers which started at 20000000000 (20 billion) to distinguish them from the current bsky.network relay, which has a sequence around 8400000000 (8.4 billion). Note that the two new relays are independent from each other, have different sequence numbers, and will have different ordering of messages. Clients subscribing to these public hostnames will technically connect to rainbow daemons which fan-out the firehose.

We encourage service operators and protocol developers to experiment with these new instances. They are running in "lenient" MST validation mode and should work with existing PDS instances and implementations. PDS hosts can direct requestCrawl API requests to them. If hosts don't show up in listHosts, try making some repo updates to emit new #commit messages. Bluesky intends to operate both of these relays, at these hostnames, for the foreseeable future.

For the second stage of the roll-out (in the near future), we will update the bsky.network domain name to point at one of these new relay instances. If all goes smoothly, firehose consumers will simply start receiving events with a higher sequence number.

As a final stage, we will re-configure both instances to enforce MST inversion more strictly and drop #commit messages which fail validation. To ensure this does not break active hosts and accounts in the network, the relays currently attempt strict validation on every message and log failures (even if they are passed through on the firehose in "lenient" mode). We will monitor the logs and work with PDS operators who have not upgraded. If necessary, we can "ratchet" validation by host, meaning that new hosts joining the network are required to validate but specific existing hosts are given more time to update.

Other Sync v1.1 Progress

We have a few resources for PDS developers implementing Sync v1.1:

  • the goat CLI tool has several --verify flags on firehose consumer command, which can be pointed at a local PDS instance for debugging
  • likewise the relay implementation can be running locally (eg, using sqlite) and pointed at a local PDS instance
  • interoperability test data in the bluesky-social/atproto-interop-tests git repository, both for MST implementation details, and for commit proofs
  • the written specifications have not been updated yet, but the proposal doc is thorough and will be the basis for updated specs

While not strictly required by the protocol, both of the new relay hostnames support the com.atproto.sync.listReposByCollection endpoint (technically via the collectiondir microservice, not part of the relay itself). This endpoint makes it much more efficient to discover which accounts in the network have records of a given type and is particularly helpful for developers building new Lexicons and working with data independent of the app.bsky.* namespace.

The need for ordered repository CAR file exports has become more clear, and an early implementation was completed for the PDS reference implementation. That implementation is not performant enough to merge yet, and it may be some time before ordered CAR files are a norm in the network. The exact ordering also needs to be described more formally to ensure interoperation. Work has not yet started on the "partial synchronization" variant of getRepo, which will allow fetching a subset of the repository.