Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community
Ariel Shaqed (Scolnicov)
Ariel Shaqed (Scolnicov) Author

Ariel Shaqed (Scolnicov) is a Principal Software Engineer at lakeFS....

,
Idan Novogroder
Idan Novogroder Author

Idan has an extensive background in software and DevOps engineering....

,
Guy Hardonag
Guy Hardonag Author

Guy has a rich background in software engineering, playing an...

Last updated on March 17, 2024

What is mirroring

We are pleased to announce a preview of a long-awaited lakeFS feature: transactional mirroring across regions. Mirroring builds on top of S3 Replication to provide a consistent view of your versioned data in other regions. Once configured, it allows creating mirrors in all of your regions. Each mirror of a source repository is a read-only replica of the committed data.


Source: Wikimedia (public domain)

Use case: read data locally

Your data is produced by pipelines running in us-east-1, but you have additional compute resources available in eu-central-1. Configure mirroring from us-east-1 into eu-central-1. You can now analyze data using compute resources available in either region.

Use case: disaster recovery

Your data is produced by pipelines running in us-west-2. Configure replication from us-west-2 to us-east-1. If us-west-2 becomes unavailable for any reason, your data remains available.

Use case: replication snapshots

S3 Replication has very high bandwidth. But it cannot guarantee that objects are copied in the same order in which they were produced. If multi-object data is continually updated, it might never be readable on the destination. For example, suppose an Iceberg table is updated every minute. Because metadata objects in Iceberg are considerably smaller than data objects, they will typically be updated first. Metadata objects at the destination can therefore refer to data objects that never arrived. Applications reading Iceberg at the destination will therefore need to incorporate complex logic to ensure they manage to read a version which has already been replicated.

By configuring mirroring from the source to the destination region and committing new versions of data, you can always read a consistent updated version.

This blog takes a deep dive into the architecture and operations of lakeFS transactional mirroring. If you would just like to get started, feel free to skip ahead to Transactional mirroring: What you’ll see.

lakeFS Transactional Mirroring Architecture

Components

lakeFS

Under the hood, lakeFS stores your data by using two AWS services:

  • On S3 in the storage namespace of a repository:
    • the metaranges and ranges, listing the exact object versions of any commit;
    • the data objects themselves.
  • On DynamoDB:
    • the head commit of each branch;
    • a record of each commit;
    • a listing of the uncommitted objects of each branch.
Transactional mirroring components

Mirroring service

In lakeFS mirroring, one active repository is mirrored to multiple passive repositories. To configure mirroring, lakeFS Cloud starts a service replicant in each region. Each replicant is responsible for mirroring into its region. From the point of view of a replicant, it promotes mirroring from remote source repositories to local destination repositories.

Mirroring depends on 3 steps:

  • Copy the data objects and the metadata objects in S3 from the source bucket to the destination bucket. You will configure S3 Replication to do this.
    This ensures that data will eventually become available. Replicated objects are created in a very different order from the original objects, so the destination bucket might never be in a consistent state.
  • Copy commits from the remote source lakeFS to the local destination lakeFS. lakeFS mirroring actively does this.
  • Promote the branch head. When the branch head at the source is different from the current branch head at the destination, wait for S3 Replication to copy all data and metadata objects, then promote the branch head at the destination to point at that commit. Since lakeFS never reuses a pathname for data or for metadata objects, mirroring can safely cache the status of all objects that have arrived.
Transactional mirroring service

The mirroring service coordinates these asynchronous operations. It ensures all data and metadata are available for the commit of a new branch head before advancing it on the destination repository. The branch at the destination lakeFS will only ever show a consistent commit.

With this architecture lakeFS Cloud leverages S3 Mirroring to copy data and some commit metadata.  lakeFS is responsible for  consistency of both metadata and data of data: whenever lakeFS advances a branch to point at a new commit, all data for that commit is already available. The effects of S3 mirroring, which is asynchronous, are removed. lakeFS mirrors branches of a source repository to a destination repository asynchronously:

  • Metadata: The first-parent history is equal to that of the source at some point.
  • Data: All data objects are available at the head of a branch.

Freshness and progress

What happens when commits on the source repository arrive continually?  Replication might not be able to keep up. When objects refer to one another and are updated, it might never be possible for Replication to give a consistent view!  This is always a worry for S3 Replication when using a versioned file format such as Iceberg or Delta Lake. The latest snapshot metadata object might always be replicated before all of the data objects have been replicated. So the multi-object file might never be readable on the destination.

Consistent commits mean that lakeFS will always give a consistent view. But mirroring additionally guarantees freshness: the head commit of a branch always advances. Once mirroring discovers that the source branch head has changed, it validates that all objects for that commit have been replicated; once they have all arrived, the destination branch head is changed to this validated commit. Only then will mirroring attempt to advance the destination branch head further. So the branch head will advance.

What if S3 Replication cannot keep up with an ongoing stream of changes and falls behind before scaling and catching up?  lakeFS mirroring still proceeds: it finds a current head commit for the branch and waits for all of its objects to arrive. Eventually S3 Replication replicates all objects of that commit, and lakeFS mirroring will advance the branch head. Now lakeFS mirroring can discover a new head commit – and start advancing the head again.

How it works

If you’re curious how it all works, this section dives into the flow of data and control during mirroring. We will look at mirroring configured from a repository “origin” in treeverse.eu-central-1.lakefscloud.io to a repository “mirror” in treeverse.us-east-1.lakefscloud.io.

Basic scenario

Branch main is at commit aa1111 in origin, and is fully synchronized with branch main at mirror.Basic scenario: branch main

User adds objects

A user adds 100 objects to lakefs://origin/main/ in eu-central-1. Every added data object is backed by an object on the storage namespace of origin (lakeFS stores data and committed metadata on S3).  So as soon as each object is added, S3 Replication will be able to copy it over.

Branch main at origin now has 100 uncommitted objects. Some of the underlying data objects may already have been copied. But none of these are visible on branch main at mirror, which shows only commits.

The user commits main in origin. Now branch main in origin is at commit bbbb22, which has parent aa1111. The commit creates additional metadata objects called “ranges” and “metaranges”. These are also in the storage namespace of origin, and S3 Replication is able to copy each one.User commits main to origin

But branch main at mirror is still at aa1111. The replicator at mirror discovers that branch main in origin has advanced. It copies the new commit bbbb22 and marks its parent at aa1111. It does not yet advance main to point at this commit: it still needs to validate that all data and metadata have arrived.Basic scenario: replicator at mirror discovers branch main in origin has advanced

The replicator starts looking for the metarange of commit bbbb22. Once that arrives it knows which ranges that metarange uses, and it can start waiting for them. Every range uses various data objects, and the replicator waits for those to arrive as well. lakeFS commits are immutable, meaning that the replicator can always safely cache the existence of all objects that it has previously seen. Successive commits are relatively similar, so this caching is very effective.

S3 Replication copies over missing objects

Eventually S3 Replication copies over all the missing objects, the validator sees all of them, and the replicator can advance branch main of main on mirror to commit bbbb22.

Skipped commits

A user adds 1000 large objects, commits with digest ccc333 with parent bbbb22, and immediately deletes 999 of them and commits again with digest d44444 with parent ccc333. lakeFS preserves history, so all 1000 data objects remain on S3. S3 Replication will start copying all these data objects, as well as metaranges and ranges for the two commits. Of course, typically only one or two ranges will not be shared between these two commits.

The replicator discovers that main has advanced to digest d44444. It copies the new commits d4444 with parent ccc333, and ccc333 with parent bbbb22. However, it does not advance main yet.

In order to guarantee timeliness, the replicator skips ccc333 and now validates only that all metadata and data of commit d44444 have arrived. When it discovers that all are present, it advances main to commit d44444. Consistency of the commit where main points is guaranteed. Typically ccc333 will also arrive before or shortly after this happens, but the replicator does not guarantee this consistency.Mirroring: Skipped commits

As a result, timeliness of the mirroring of commits does not depend on the number of commits performed.

Catching up

A user adds an object and commits every 5 seconds. This can generate at most an additional 2 metadata objects on S3 every 5 seconds. S3 Replication can easily sustain this average rate, but every individual commit will take much longer than 5 seconds to replicate.

The replicator always tries to advance a single commit. So if advancing to the next commit takes 30 seconds, it has fallen back 6 commits! It will generate 6 new commits, but only attempt to validate that last commit. S3 Replication is massively scalable and will keep up on average but with large delay. So even if mirroring falls behind, the replicator will skip validating a few commits in order to advance the branch. This effectively bounds the delay in most cases.

Transactional mirroring: What you’ll see

Here is a session showing replication from a source repository src-repo-replication to a mirror repository mirror-reposrc-repo-replication has a branch dev, which mirroring has already synchronized dev-branch.  Initially it points at the same commit in both repositories.

Now let’s upload a file myfile.json to the source branch, without committing it.

No matter how long we wait, we will never see myfile.json replicated – we need to commit our change.

Let’s commit our changes to branch dev on the source repository.

After a while, the commit arrives at the destination repository.

And we can see our file replicated to branch dev on that repository.

Configuring mirroring

The instructions to set up mirroring detail the following steps:

  1. Configure bucket replication on S3. The objects within the repository are copied using your cloud provider’s object store replication mechanism; for AWS S3 follow the AWS S3 Replication documentation to replicate the storage namespace of the source lakeFS repository to a bucket on the destination region.
    The replication rule replicates all new objects to the destination bucket. If the repository already exists, you will use an S3 batch job to replicate existing objects.
  2. Create a lakeFS user with a “replicator” policy. This user will be used for mirroring, so you will attach a specific RBAC ReplicationPolicy to this user. Create an access key and secret to be used by the mirroring process.
  3. Authorize the lakeFS Mirror process to use the mirroring user. Mirroring is now in public preview. To connect the newly created user with the mirroring process, contact lakeFS Support.
  4. Configure repository mirroring. Mirroring has a stand-alone HTTP API, and you will need to use cURL or a similar HTTP client to configure it.
  5. Mirroring and Garbage Collection. Garbage collection won’t run on mirrored repositories. Deletions from garbage collection should be replicated from the source:
    1. Enable DELETED marker replication on the source bucket.
    2. Create a lifecycle policy on the destination bucket to delete the objects with the DELETED marker.

Getting started

To access Transactional Mirroring, you’ll need to either log into your Cloud account or create a new account if you don’t already have one (no credit card required, no strings attached!). You can then follow the detailed instructions available on our documentation to set everything up. And, as always, we’ll be happy to hear from you on our Slack Community!

Git for Data – lakeFS

  • Get Started
    Get Started