We’re excited to introduce a powerful new capability in lakeFS Enterprise: the lakeFS Iceberg REST Catalog – a fully standards-compliant implementation of the Apache Iceberg REST Catalog specification.

With this release, lakeFS now enables seamless version control for both structured and unstructured data at any scale.

Think Git-style workflows, now for your Iceberg tables. No lock-in, no extra tools, just powerful data & AI engineering done right.

Open Standards, Zero Lock-In

The lakeFS Iceberg REST Catalog implements the official Apache Iceberg REST Catalog spec. That means:

Works out-of-the-box with Apache Spark, Trino, Flink, and other engines that support REST Catalogs

No proprietary formats or vendor lock-in
No extra libraries or plugins required

Use Cases: Isolated development, Automation & Collaboration

With lakeFS Iceberg REST Catalog, you’ll be able to achieve:

Version-Controlled Data Development

Create feature branches for table schema changes or data migrations
Test modifications in isolation, across multiple tables
Merge changes safely with conflict detection

Multi-Environment Management

Use zero-copy branches to represent different environments (dev, staging, prod)
Promote changes between environments through merges, with automated testing
Maintain consistent table schemas and data across environments

Collaborative Data Development

Multiple teams can work on different table features simultaneously without stepping on each other’s toes
Maintain data quality through pre-merge validations
Collaborate using pull requests on changes to data and schema

Manage and Govern Access to Data

Use the detailed built-in commit log capturing who, what, and how data is changed
Manage access using fine-grained access control for users and groups using RBAC policies
Rollback changes atomically and safely to reduce time-to-recover and increase system stability

Reproducibility at Scale

Managing thousands of Iceberg tables across petabytes of data? lakeFS has you covered.

Repositories version all namespaces and tables atomically
Easily go back to any state of your catalog from any point in time
Roll back mistakes instantly, across all affected tables

How does it work?

lakeFS Enterprise exposes an implementation of the REST catalog interface as published by the Apache Iceberg project.

Behind the scenes, a request to the catalog does the following:

Given the table’s namespace, extract the repository and branch (i.e. for table my-repo.main.inventory.books, the repository would be my-repo and main would be the lakeFS reference). This could also be a tag name or commit hash – any lakeFS reference should work.
Use lakeFS’ versioned metadata to store or retrieve the current Iceberg metadata file for the requested repository and reference
On modification to a table, create a new metadata file and replace its pointer for the requested branch

lakeFS Iceberg REST Catalog how it works

This Approach has several benefits:

It moves the versioning capabilities (branching, merging, committing, etc) out of the critical path: reading and writing from a table is done from the clients to the underlying object store without any data going through lakeFS itself.
It leverages existing lakeFS primitives – building on top of a solid, proven foundation to atomically branch, commit and merge changes in large scales.
It allows versioning both structured and unstructured data together, in the same repository, ensuring reproducibility regardless of data type

To learn more about the architecture and design of the Iceberg REST Catalog, see the official integration page on the lakeFS documentation.

Example: Using the lakeFS Iceberg REST Catalog with PyIceberg

Using the PyIceberg client, you can interact with the lakeFS REST Catalog just like any Iceberg-native catalog:

Copy Code

import lakefs
from pyiceberg.catalog import load_catalog

# Initialize the catalog
catalog = RestCatalog(name="lakefs-catalog", **{
    'prefix': 'lakefs',
    'uri': 'https://lakefs.example.com/iceberg/api',
    'oauth2-server-uri': 'https://lakefs.example.com/iceberg/api/iceberg/api/v1/oauth/tokens',
    'credential': f'AKIAlakefs12345EXAMPLE:abc/lakefs/1234567bPxRfiCYEXAMPLEKEY',
})

# List namespaces in a branch
catalog.list_namespaces(('repo', 'main'))

# Query a table
catalog.list_tables('repo.main.inventory')
table = catalog.load_table('repo.main.inventory.books')
arrow_df = table.scan().to_arrow()

You can also retrieve and inspect tables directly:

Copy Code

branch = lakefs.repository('repo').branch('dev').create(source_reference='main')

# The table is now accessible in the new branch
dev_table = catalog.load_table(f'repo.{branch.id}.inventory.books')

The lakeFS Iceberg REST Catalog works with any standard Iceberg client. See the official documentation for more examples and detailed usage instructions.

Get Started Today

The lakeFS Iceberg REST Catalog is available now as part of lakeFS Enterprise.

If you’re using Iceberg and need data versioning, reproducibility, production safety, and compliance, this is the way to do it.
Contact us for a free trial and see how lakeFS can power your data platform, structured or unstructured.

lakeFS Iceberg REST Catalog: Version Control for Structured Data, at Scale

Open Standards, Zero Lock-In

Use Cases: Isolated development, Automation & Collaboration

Version-Controlled Data Development

Multi-Environment Management

Collaborative Data Development

Manage and Govern Access to Data

Reproducibility at Scale

How does it work?

Example: Using the lakeFS Iceberg REST Catalog with PyIceberg

Get Started Today

Watch how lakeFS works

Need help getting started?

lakeFS

lakeFS Iceberg REST Catalog: Version Control for Structured Data, at Scale

Open Standards, Zero Lock-In

Use Cases: Isolated development, Automation & Collaboration

Version-Controlled Data Development

Multi-Environment Management

Collaborative Data Development

Manage and Govern Access to Data

Reproducibility at Scale

How does it work?

Example: Using the lakeFS Iceberg REST Catalog with PyIceberg

Get Started Today

Related articles

Introducing the AI-Ready Data Summit

What is Metadata Tracking? Types, Tools & Best Practices

How CytoReason Streamlined Nextflow with lakeFS for Smarter Data Pipelines

Watch how lakeFS works

lakeFS

Pick up the Slack with lakeFS