lakeFS Enterprise offers a fully standards-compliant implementation of the Apache Iceberg REST Catalog, enabling Git-style version control for structured data at scale. This integration allows teams to use Iceberg-compatible tools like Spark, Trino, and PyIceberg without any vendor lock-in or proprietary formats.
By treating Iceberg tables as versioned entities within lakeFS repositories and branches, users can create isolated environments to test schema changes, ingest new data, or experiment safely without impacting production. Branches are metadata-only and zero-copy, making them fast and storage-efficient. Once validated, changes can be merged across branches using conflict-aware operations.
The REST Catalog ensures efficient performance by routing table access directly between compute engines and object stores, bypassing lakeFS in the data path. Governance is also enhanced – every change is recorded as a commit, allowing full traceability and rollback. Role-Based Access Control (RBAC) and audit logs support compliance and security needs.
This tutorial includes a PyIceberg example, showing how to register and access Iceberg tables via lakeFS using standard APIs.
In short, lakeFS’s REST Catalog for Iceberg brings version control, reproducibility, and safe collaboration to modern data lake architectures built on Apache Iceberg.
What you’ll learn in this tutorial
- How to run lakeFS Enterprise locally
- How to create and manage an Iceberg table backed by lakeFS
- How to read and query the table using PyIceberg
- How to version changes using lakeFS
branchesandcommits
Prerequisites
Make sure you have:
- Docker & Docker Compose
- Git
Step 1: Clone the lakeFS Samples
git clone https://github.com/treeverse/lakeFS-samples.git
cd lakeFS-samples/02_lakefs_enterpriseStep 2: Start lakeFS Enterprise
The lakeFS Enterprise Sample is the quickest way to experience the value of lakeFS Enterprise features including lakeFS Iceberg REST Catalog in a containerized environment. This Docker-based setup is ideal if you want to easily interact with lakeFS without the hassle of integration and experiment with lakeFS without writing code.
By running the lakeFS Enterprise Sample, you will be getting a ready-to-use environment including the following containers:
- lakeFS Enterprise (includes additional features like lakeFS Iceberg REST Catalog)
- Postgres: used by lakeFS as a KV (Key-Value) store
- MinIO container: used as S3-compatible object storage connected to lakeFS
- Jupyter notebooks setup: Pre-populated with notebooks that demonstrate lakeFS Enterprise’ capabilities
- Apache Spark: Spark client instead of PyIceberg can be used for interacting with Iceberg tables you’ll manage with lakeFS
Contact lakeFS to get the token for lakeFS Enterprise and login to Treeverse Dockerhub by using the granted token so lakeFS Enterprise proprietary image can be retrieved:
docker login -u externallakefsRun following command to provision a lakeFS Enterprise server as well as MinIO for your object store, plus Jupyter:
docker compose upThis starts:
- lakeFS UI: http://localhost:8084
- Jupyter UI: http://localhost:8894/
- MinIO UI: http://localhost:9005
Step 3: Login to lakeFS
Go to lakeFS UI (http://localhost:8084)
Use credentials:
- Access Key ID: AKIAIOSFOLKFSSAMPLES
- Secret Access Key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Step 4: Create a lakeFS Repository
Go to Jupyter UI (http://localhost:8894) and open “iceberg-books” notebook from the side File Browser panel:

In the notebook, run the cells till you define/create lakeFS repository named “lakefs-py-iceberg”:
repo = lakefs.Repository(repo_name).create(storage_namespace=f"{storageNamespace}/ {repo_name}", default_branch=mainBranch, exist_ok=True)
branchMain = repo.branch(mainBranch)Refresh the lakeFS UI, repository named “lakefs-py-iceberg” will get created:

Step 5: Configure the Iceberg REST Catalog
Continue running the cells in notebook and configure lakeFS Iceberg REST Catalog:
- The Iceberg REST catalog API is exposed at “/iceberg/api” in the lakeFS server
- Use lakeFS access key and secret for authentication
- Use MinIO endpoint and credentials so PyIceberg client can access S3-compatible object storage
catalog = RestCatalog(
name = "my_catalog",
**{
'prefix': 'lakefs',
'uri': f'{lakefsEndPoint}/iceberg/api',
'oauth2-server-uri': f'{lakefsEndPoint}/iceberg/api/v1/oauth/tokens',
'credential': f'{lakefsAccessKey}:{lakefsSecretKey}',
's3.endpoint': 'http://minio:9000',
's3.access-key-id': 'minioadmin',
's3.secret-access-key': 'minioadmin',
's3.region': 'us-east-1',
's3.force-virtual-addressing': False,
})Step 6: Create the Iceberg Namespace and Tables Using PyIceberg
Create “lakefs_demo” Iceberg namespace in the lakeFS repository’s “main” branch:
lakefs_demo_ns = (repo_name, mainBranch, icebergNamespace)
catalog.create_namespace(lakefs_demo_ns)Go to lakeFS UI and click on “lakefs-py-iceberg” repository to open the repository. Next, click on “_lakefs_tables” > “iceberg” > “namespaces” and you will notice that “lakefs_demo” namespace got created in the lakeFS repository’s “main” branch:

Create multiple tables in “lakefs_demo” namespace in the lakeFS repository’s “main” branch:
# create authors table
authors_schema = Schema(
NestedField(
field_id=1,
name="id",
field_type=IntegerType(),
required=True
),
NestedField(
field_id=2,
name="name",
field_type=StringType(),
required=True
),
)
table_authors = (repo_name, mainBranch, icebergNamespace, 'authors')
catalog.create_table(
identifier=table_authors,
schema=authors_schema
)Go back to lakeFS UI and click on “lakefs_demo” > “tables” and you will notice that tables got created in the “lakefs_demo” namespace:

Step 7: Insert Sample Data
Insert sample data into all three tables:
# Insert data into the authors table
authors_data = [
{"id": 1, "name": "J.R.R. Tolkien"},
{"id": 2, "name": "George R.R. Martin"},
{"id": 3, "name": "Agatha Christie"},
{"id": 4, "name": "Isaac Asimov"},
{"id": 5, "name": "Stephen King"},
]
authors_arrow_schema = pa.schema([
pa.field("id", pa.int8(), nullable=False),
pa.field("name", pa.string(), nullable=False),
])
authors_arrow_table = pa.Table.from_pylist(authors_data, schema=authors_arrow_schema)
authors_table = catalog.load_table(table_authors)
authors_table.append(authors_arrow_table)Step 8: Create a Branch in lakeFS
Create a “dev” branch sourced from the “main” branch in the lakeFS repository:
branchDev = repo.branch(devBranch).create(source_reference=mainBranch)Click on “Branches” tab in lakeFS UI and you will notice that “dev” branch got created:

Step 9: Change Data in the New Branch
Delete a few records in a table in “dev” branch:
table_book_sales = (repo_name, devBranch, icebergNamespace, 'book_sales')
book_sales_table = catalog.load_table(table_book_sales)
book_sales_table.delete(delete_filter="id IN (1, 2, 6, 10, 15)")Now compare data between “main” and “dev” branches:

Step 10: Merge and Revert the Changes
Merge the changes in data in the “dev” branch into “main” branch:
res = branchDev.merge_into(branchMain)You also have the option to revert/rollback the changes from the “main” branch:
branchMain.revert(parent_number=1, reference=mainBranch)Step 11: Try Other Samples
You can also try “iceberg-books-spark” and “iceberg-books-trino” notebooks which use Spark and Trino client respectively instead of PyIceberg.
Step 12: Shut Everything Down
Once you’re finished, you can run the following to remove the Docker containers created in Step 2 above:
docker compose downSummary
With PyIceberg and lakeFS you can now:
- Read and manage Iceberg tables entirely in Python
- Use lakeFS to isolate, test and version Iceberg tables
- Merge validated changes just like in Git workflows
- Revert the changes (if needed)
Resources
- lakeFS Docs: https://docs.lakefs.io/latest/integrations/iceberg/
- PyIceberg Docs: https://py.iceberg.apache.org
- lakeFS Sample Notebook: https://github.com/treeverse/lakeFS-samples/blob/main/00_notebooks/iceberg-books.ipynb



