lakeFS Enterprise offers a fully standards-compliant implementation of the Apache Iceberg REST Catalog, enabling Git-style version control for structured data at scale. This integration allows teams to use Iceberg-compatible tools like Spark, Trino, and PyIceberg without any vendor lock-in or proprietary formats.

By treating Iceberg tables as versioned entities within lakeFS repositories and branches, users can create isolated environments to test schema changes, ingest new data, or experiment safely without impacting production. Branches are metadata-only and zero-copy, making them fast and storage-efficient. Once validated, changes can be merged across branches using conflict-aware operations.

The REST Catalog ensures efficient performance by routing table access directly between compute engines and object stores, bypassing lakeFS in the data path. Governance is also enhanced – every change is recorded as a commit, allowing full traceability and rollback. Role-Based Access Control (RBAC) and audit logs support compliance and security needs.

This tutorial includes a PyIceberg example, showing how to register and access Iceberg tables via lakeFS using standard APIs.

In short, lakeFS’s REST Catalog for Iceberg brings version control, reproducibility, and safe collaboration to modern data lake architectures built on Apache Iceberg.

What you’ll learn in this tutorial

How to run lakeFS Enterprise locally
How to create and manage an Iceberg table backed by lakeFS
How to read and query the table using PyIceberg
How to version changes using lakeFS branches and commits

Prerequisites

Make sure you have:

Docker & Docker Compose
Git

Step 1: Clone the lakeFS Samples

Copy Code

git clone https://github.com/treeverse/lakeFS-samples.git
cd lakeFS-samples/02_lakefs_enterprise

Step 2: Start lakeFS Enterprise

The lakeFS Enterprise Sample is the quickest way to experience the value of lakeFS Enterprise features including lakeFS Iceberg REST Catalog in a containerized environment. This Docker-based setup is ideal if you want to easily interact with lakeFS without the hassle of integration and experiment with lakeFS without writing code.

By running the lakeFS Enterprise Sample, you will be getting a ready-to-use environment including the following containers:

lakeFS Enterprise (includes additional features like lakeFS Iceberg REST Catalog)
Postgres: used by lakeFS as a KV (Key-Value) store
MinIO container: used as S3-compatible object storage connected to lakeFS
Jupyter notebooks setup: Pre-populated with notebooks that demonstrate lakeFS Enterprise’ capabilities
Apache Spark: Spark client instead of PyIceberg can be used for interacting with Iceberg tables you’ll manage with lakeFS

Contact lakeFS to get the token for lakeFS Enterprise and login to Treeverse Dockerhub by using the granted token so lakeFS Enterprise proprietary image can be retrieved:

Copy Code

docker login -u externallakefs

Run following command to provision a lakeFS Enterprise server as well as MinIO for your object store, plus Jupyter:

Copy Code

docker compose up

This starts:

lakeFS UI: http://localhost:8084
Jupyter UI: http://localhost:8894/
MinIO UI: http://localhost:9005

Step 3: Login to lakeFS

Go to lakeFS UI (http://localhost:8084)

Use credentials:

Access Key ID: AKIAIOSFOLKFSSAMPLES
Secret Access Key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Step 4: Create a lakeFS Repository

Go to Jupyter UI (http://localhost:8894) and open “iceberg-books” notebook from the side File Browser panel:

In the notebook, run the cells till you define/create lakeFS repository named “lakefs-py-iceberg”:

Copy Code

repo = lakefs.Repository(repo_name).create(storage_namespace=f"{storageNamespace}/ {repo_name}", default_branch=mainBranch, exist_ok=True)

branchMain = repo.branch(mainBranch)

Refresh the lakeFS UI, repository named “lakefs-py-iceberg” will get created:

Step 5: Configure the Iceberg REST Catalog

Continue running the cells in notebook and configure lakeFS Iceberg REST Catalog:

The Iceberg REST catalog API is exposed at “/iceberg/api” in the lakeFS server
Use lakeFS access key and secret for authentication
Use MinIO endpoint and credentials so PyIceberg client can access S3-compatible object storage

Copy Code

catalog = RestCatalog(
    name = "my_catalog",
    **{
    'prefix': 'lakefs',
    'uri': f'{lakefsEndPoint}/iceberg/api',
    'oauth2-server-uri': f'{lakefsEndPoint}/iceberg/api/v1/oauth/tokens',
    'credential': f'{lakefsAccessKey}:{lakefsSecretKey}',
    's3.endpoint': 'http://minio:9000',
    's3.access-key-id': 'minioadmin',
    's3.secret-access-key': 'minioadmin',
    's3.region': 'us-east-1',
    's3.force-virtual-addressing': False,
})

Step 6: Create the Iceberg Namespace and Tables Using PyIceberg

Create “lakefs_demo” Iceberg namespace in the lakeFS repository’s “main” branch:

Copy Code

lakefs_demo_ns = (repo_name, mainBranch, icebergNamespace)
catalog.create_namespace(lakefs_demo_ns)

Go to lakeFS UI and click on “lakefs-py-iceberg” repository to open the repository. Next, click on “_lakefs_tables” > “iceberg” > “namespaces” and you will notice that “lakefs_demo” namespace got created in the lakeFS repository’s “main” branch:

Create multiple tables in “lakefs_demo” namespace in the lakeFS repository’s “main” branch:

Copy Code

# create authors table
authors_schema = Schema(
    NestedField(
        field_id=1,
        name="id",
        field_type=IntegerType(),
        required=True
    ),
    NestedField(
        field_id=2,
        name="name",
        field_type=StringType(),
        required=True
    ),
)
table_authors = (repo_name, mainBranch, icebergNamespace, 'authors')

catalog.create_table(
    identifier=table_authors,
    schema=authors_schema
)

Go back to lakeFS UI and click on “lakefs_demo” > “tables” and you will notice that tables got created in the “lakefs_demo” namespace:

Step 7: Insert Sample Data

Insert sample data into all three tables:

Copy Code

# Insert data into the authors table
authors_data = [
    {"id": 1, "name": "J.R.R. Tolkien"},
    {"id": 2, "name": "George R.R. Martin"},
    {"id": 3, "name": "Agatha Christie"},
    {"id": 4, "name": "Isaac Asimov"},
    {"id": 5, "name": "Stephen King"},
]

authors_arrow_schema = pa.schema([
    pa.field("id", pa.int8(), nullable=False),
    pa.field("name", pa.string(), nullable=False),
])
authors_arrow_table = pa.Table.from_pylist(authors_data, schema=authors_arrow_schema)
authors_table = catalog.load_table(table_authors)
authors_table.append(authors_arrow_table)

Step 8: Create a Branch in lakeFS

Create a “dev” branch sourced from the “main” branch in the lakeFS repository:

Copy Code

branchDev = repo.branch(devBranch).create(source_reference=mainBranch)

Click on “Branches” tab in lakeFS UI and you will notice that “dev” branch got created:

Step 9: Change Data in the New Branch

Delete a few records in a table in “dev” branch:

Copy Code

table_book_sales = (repo_name, devBranch, icebergNamespace, 'book_sales')
book_sales_table = catalog.load_table(table_book_sales)
book_sales_table.delete(delete_filter="id IN (1, 2, 6, 10, 15)")

Now compare data between “main” and “dev” branches:

Step 10: Merge and Revert the Changes

Merge the changes in data in the “dev” branch into “main” branch:

Copy Code

res = branchDev.merge_into(branchMain)

You also have the option to revert/rollback the changes from the “main” branch:

Copy Code

branchMain.revert(parent_number=1, reference=mainBranch)

Step 11: Try Other Samples

You can also try “iceberg-books-spark” and “iceberg-books-trino” notebooks which use Spark and Trino client respectively instead of PyIceberg.

Step 12: Shut Everything Down

Once you’re finished, you can run the following to remove the Docker containers created in Step 2 above:

Copy Code

docker compose down

Summary

With PyIceberg and lakeFS you can now:

Read and manage Iceberg tables entirely in Python
Use lakeFS to isolate, test and version Iceberg tables
Merge validated changes just like in Git workflows
Revert the changes (if needed)

Resources

lakeFS Docs: https://docs.lakefs.io/latest/integrations/iceberg/
PyIceberg Docs: https://py.iceberg.apache.org
lakeFS Sample Notebook: https://github.com/treeverse/lakeFS-samples/blob/main/00_notebooks/iceberg-books.ipynb

Versioned Data with Apache Iceberg Using lakeFS Iceberg REST Catalog

What you’ll learn in this tutorial

Prerequisites

Step 1: Clone the lakeFS Samples

Step 2: Start lakeFS Enterprise

Step 3: Login to lakeFS

Step 4: Create a lakeFS Repository

Step 5: Configure the Iceberg REST Catalog

Step 6: Create the Iceberg Namespace and Tables Using PyIceberg

Step 7: Insert Sample Data

Step 8: Create a Branch in lakeFS

Step 9: Change Data in the New Branch

Step 10: Merge and Revert the Changes

Step 11: Try Other Samples

Step 12: Shut Everything Down

Summary

Resources

Watch lakeFS Iceberg tutorial

Need help getting started?

lakeFS

Versioned Data with Apache Iceberg Using lakeFS Iceberg REST Catalog

What you’ll learn in this tutorial

Prerequisites

Step 1: Clone the lakeFS Samples

Step 2: Start lakeFS Enterprise

Step 3: Login to lakeFS

Step 4: Create a lakeFS Repository

Step 5: Configure the Iceberg REST Catalog

Step 6: Create the Iceberg Namespace and Tables Using PyIceberg

Step 7: Insert Sample Data

Step 8: Create a Branch in lakeFS

Step 9: Change Data in the New Branch

Step 10: Merge and Revert the Changes

Step 11: Try Other Samples

Step 12: Shut Everything Down

Summary

Resources

Related articles

Iceberg REST Catalog Alternatives: Top Options & How to Choose The Best One For Your Team

lakeFS Top 10 Defining Product Milestones in 2025

How CytoReason Streamlined Nextflow with lakeFS for Smarter Data Pipelines

Watch lakeFS Iceberg tutorial

lakeFS

Pick up the Slack with lakeFS