A behind-the-scenes look at the design decisions, architecture, and lessons learned while bringing the Apache Iceberg REST Catalog to lakeFS.

When we first announced our native lakeFS Iceberg REST Catalog, we focused on what it means for data teams: seamless, Git-like version control for structured and unstructured data, at any scale. But how did we build it? What were the trade-offs, the “aha!” moments, and the hard problems we had to solve?

For the builders among you, we’re pulling back the curtain to share the engineering story behind the feature.

Introduction: Why a Native Iceberg Catalog?

Apache Iceberg has emerged as the leading open table format for large scale analytic datasets. Its powerful features depend on a central component: the catalog. The catalog is the source of truth, tracking a table’s current state.

While Iceberg supports various catalog types, our users, who already leverage lakeFS for versioning their data lake, asked for a more integrated experience. They wanted to manage their Iceberg tables with the same atomic, branch-based workflows they use for the rest of their data assets. The goal was clear: build a fully compliant Iceberg REST Catalog that speaks lakeFS fluently.

Our primary goals for this implementation were:

Goal	Description
Full Spec Compliance	Work out-of-the-box with any Iceberg-compatible engine like Spark, Trino, or Flink
Zero-Copy Branching	Creating a new branch of your entire data warehouse should be a metadata-only operation, taking milliseconds
Atomic Multi-Table Transactions	Commits involving multiple table changes must be truly atomic, ensuring consistency
Leverage Existing Primitives	Build upon the core strengths of lakeFS – its transactional guarantees and versioning engine – without reinventing the wheel

Iceberg Internals 101

To understand our design, it helps to know a little about Iceberg’s structure. An Iceberg table is a tree of metadata files that ultimately point to data files (e.g., Parquet, ORC).

metadata.json: The root of the tree. Each version of this file defines a snapshot of the table.
Manifest List: Points to one or more manifest files.
Manifest Files: Each manifest tracks a list of data files, along with statistics about them.
Data Files: The actual data.

The catalog’s role is simple but critical: it’s the definitive source for locating a table’s current metadata. When a query engine like Spark or Trino writes to a table, it doesn’t modify files in place. Instead, it writes new data and metadata files and then asks the catalog to perform an atomic “swap” of the pointer from the old root metadata file to the new one. This atomic operation is the lynchpin of Iceberg’s consistency guarantees. It ensures that consumers of the data will never see a partially written update, preventing data corruption and guaranteeing that all users have a consistent view of the table at any given moment.

Architectural Overview: Pointers in a Versioned World

Our core architectural decision was to create a two-layer storage model. This approach separates the Iceberg metadata from the lakeFS object pointers, allowing us to version the state of the catalog without copying the metadata itself.

1. The Two-Layer Storage Model

Here’s how it works:

Pointer Object in lakeFS: For each Iceberg table, we create an object within a lakeFS repository (stored in a dedicated path). This object’s physical address is a pointer to the actual `metadata.json` file.
Actual Iceberg Metadata: The `metadata.json` files, manifest lists, and manifest files are stored in a location in the underlying object store (e.g., S3, GCS). This directory is not directly versioned by lakeFS.

When a client loads a table, the lakeFS Iceberg Catalog first reads the pointer object from the specified lakeFS branch to find the physical location of the current `metadata.json`, and then fetches it from the underlying storage. We version only the pointer because Iceberg itself is immutable; it creates new metadata files for every change. This avoids redundant versioning and keeps lakeFS operations extremely efficient.

2. Git-Like Operations for Tables

This pointer-based design is what unlocks lakeFS’s super-powers for Iceberg.

Branching: When you create a branch in lakeFS, you get a zero-copy, isolated snapshot of all your table pointers. You can make changes to tables on this branch without affecting `main`.
Committing: A `Commit` operation to an Iceberg table is translated into a lakeFS commit. It atomically updates the pointer object to point to the new `metadata.json` file.
Merging: Merging a branch back to `main` is as simple as merging the pointer objects. If two branches modified the same table, lakeFS’s merge process will detect a conflict on the pointer object, preventing divergent changes.

3. Transactional Guarantees

A core responsibility of any catalog is to provide concurrency control – a way to ensure that multiple writers don’t corrupt the data. Instead of building a new locking mechanism from scratch, we leaned on lakeFS’s built-in transactional guarantees, which provide optimistic concurrency control out of the box.

lakeFS offers two primary ways to achieve this:

Conditional Writes: This is a low-level primitive, similar to a compare-and-swap operation. A write will only succeed if the object being modified hasn’t changed since it was last read. It’s fast and efficient for single-object updates.
Branching and Merging: For more complex operations involving multiple objects, lakeFS uses its Git-like workflow. A user creates a branch, makes any number of changes in isolation, and then atomically merges them back.

So with lakeFS we have two different methods for implementing optimistic concurrency: both only check for conflicts at the very end of an operation. If two operations conflict, one will fail and need to be retried.

Hard Problems & Trade-Offs

The journey wasn’t without its challenges. Here are a couple of the interesting problems we navigated.

Go vs. Java: Choosing Our Tools

The official Iceberg project is predominantly developed in Java, offering the most mature set of libraries. However, the core of lakeFS is built in Go. This presented us with a fundamental choice: either implement the catalog in Java, creating a separate service that would need to integrate with our existing Go infrastructure, or embrace the nascent but promising iceberg-go library.

We ultimately chose Go, a decision that significantly accelerated our development velocity. This allowed us to reuse a vast amount of existing lakeFS infrastructure for configuration, block storage access, authentication, and testing. It dramatically accelerated our time-to-market, enabling us to get the feature into users’ hands faster.

However, this choice wasn’t without its challenges. The iceberg-go library, while functional, was (and still is) less mature than its Java counterpart. We anticipated and sometimes encountered a narrower scope of supported Iceberg API features, requiring us to carefully manage our implementation roadmap around its capabilities. There was also the inherent risk of a less battle-tested codebase, potentially leading to more subtle bugs or unexpected behaviors.

Our decision reflected the deliberate tradeoff: prioritize rapid iteration and deep integration with our existing stack, acknowledging that we might need to contribute more actively to the iceberg-go community or strategically evolve our approach as more advanced Iceberg features become critical for our users.

Designing for the Long Haul

One of the most critical design challenges was anticipating how the data model would evolve once deployed in production. Once users start storing their Iceberg tables with our catalog, any modification to the underlying storage layout or pointer structure becomes incredibly complex. Migrations in such a scenario would need to handle live data across multiple branches, potentially impacting availability and data integrity.

We invested significant time upfront designing the pointer structure and storage paths to be flexible enough for future enhancements while remaining simple and performant within lakeFS. Our two-layer approach, where versioned pointers live within lakeFS and the actual Iceberg metadata resides in object storage, provides us this crucial isolation. It means we can iterate on the catalog’s internal capabilities and even the pointer format without forcing a potentially complex, live migration of all existing table data. This gives us room to evolve the catalog’s capabilities without breaking existing deployments.

Garbage Collection in a Multi-Branch World

Garbage collection (GC) remains a challenge in our multi-branch environment. Standard Iceberg GC tools operate under the assumption of a single, linear history, expiring old snapshots and deleting orphaned data files that are no longer referenced. In lakeFS, however, data that appears “expired” or unreferenced on one branch might still be actively used on another branch. Running traditional GC could accidentally delete data that’s still live elsewhere.

This inter-branch dependency is the core of the problem. Safely identifying truly orphaned data requires a global view across all existing branches and their respective table pointers. To address this, we have currently disabled client-side GC operations within the lakeFS Iceberg Catalog and will soon implement a lakeFS-aware GC service that scans all branches to safely determine which files can be deleted.

Roadmap: What’s Next?

This is just the beginning. Our roadmap includes:

Managed GC and Compaction: A fully-managed, branch-aware service to safely clean up and optimize Iceberg tables.
Advanced Merge Strategies: Moving beyond simple conflict detection to enable intelligent, table-level merging of non-conflicting schema and data changes.
WebUI: Adding a catalog management section for convenient handling of structured data.
Catalog Sync: Tools to easily import tables from other catalogs (like AWS Glue) into lakeFS, and export them back out.
Views, Maintenance Procedures, and Rich RBAC: Continuing to build out full support for the entire Iceberg feature set.

Lessons Learned

Building the lakeFS Iceberg REST Catalog was a journey filled with challenges and “aha” moments. Beyond the technical solutions, we distilled key lessons that shaped our approach and can apply to other complex engineering endeavors:

1. Leverage Your Primitives:

Our most powerful realization was the effectiveness of leveraging lakeFS core primitives – its versioning engine and transactional guarantees – as foundational building blocks for the Iceberg catalog. Instead of attempting to re-implement Iceberg’s catalog logic and its versioning within the catalog itself, we designed a two-layer pointer architecture.

This approach

Reduced Complexity: we avoided reinventing the wheel for version control. lakeFS already handles atomic commits, branching, merging, and conflict detection for objects. By representing Iceberg tables as versioned “pointer objects” in lakeFS, we inherited all these capabilities.
Accelerated Development: this strategic reuse significantly reduced the amount of new code we had to write and test. Our focus shifted from building a new versioning system to elegantly mapping Iceberg’s metadata lifecycle onto lakeFS existing primitives. This directly contributed to our faster time-to-market.
Increased Robustness: By relying on battle-tested lakeFS internals, the Iceberg Catalog immediately benefited from the stability and performance characteristics of our core product. This pattern of identifying and leveraging robust existing primitives within your own ecosystem is a powerful strategy for building complex systems quickly – and reliably.

2. Balance Spec Compliance with Real-World Needs:

While strict adherence to the Apache Iceberg REST Catalog spec was a non-negotiable requirement for interoperability with engines like Spark and Trino, we quickly learned that spec compliance alone wasn’t sufficient. We also had to consider how users would actually use the system.

This dual focus led us to prioritize features that went beyond the basic REST API, specifically:

Transactional Guarantees: users coming to lakeFS expect atomic operations. Our decision to wrap every Iceberg modification in a lakeFS transaction, utilizing ephemeral branches and atomic merges, was a direct response to this need for strong consistency, even when the Iceberg spec itself might allow for looser eventual consistency models in some catalog implementations.
Branching Experience: The core value proposition of lakeFS is Git-like branching for data. We ensured that creating a branch of your Iceberg tables felt as natural and zero-copy as branching any other data in lakeFS. This meant making the pointer-based architecture work flawlessly with lakeFS’s branch mechanics.

This lesson underscored the importance of stepping back from purely technical specifications to understand the broader user experience and operational expectations. A technically compliant solution that doesn’t fit naturally into existing user workflows will struggle to gain adoption.

3. The Value of a Proper Design:

The intensive upfront design process, before a single line of production code was written for the catalog, proved to be invaluable. While it felt like a significant time investment at the start, it ultimately saved us far more time and effort during implementation and debugging.

The design phase allowed us to:

Identify Hard Problems Early: by meticulously mapping out the architecture and data flows, we were able to anticipate and confront complex challenges like multi-branch garbage collection, data model evolution, and cross-language integration before they became costly impediments in the development cycle.
Align the Team: a well-documented and thoroughly discussed design served as a clear blueprint. It ensured that every engineer on the team shared a common understanding of the system’s goals, its components, and how they would interact. This alignment minimized misunderstandings, reduced rework, and fostered more efficient collaboration.
Validate Assumptions: the design phase provided a structured way to test our assumptions about Iceberg’s internals, lakeFS capabilities, and the interactions between them. We could iterate on ideas on paper (or in design documents) quickly and cheaply, rather than discovering flaws much later in code.

Investing in a robust design process is not just about drawing diagrams; it’s about rigorous critical thinking and problem anticipation. The design process forced us to think through the hard problems and the data model. But it doesn’t end when coding begins – our design document is a living blueprint. As we implemented specific components, we uncovered new nuances and opportunities for improvement, and frequently returned to the design for refinements and updates. This iterative loop was invaluable, allowing fast-paced development while minimizing surprises, ensuring our final architecture was not just well-planned, but also battle-tested against the realities of the code.

Conclusion

Building the lakeFS Iceberg REST Catalog was a fascinating engineering challenge. By combining the power of Iceberg’s table format with lakeFS’s Git-like versioning, we’ve created a unique solution for managing the entire data lifecycle. The architecture, centered around versioned pointers and atomic transactions, provides a robust foundation for the exciting features yet to come.

We’re proud of what we’ve built, and we’re even more excited about what you’ll build with it.

How We Built Our lakeFS Iceberg Catalog

Introduction: Why a Native Iceberg Catalog?

Iceberg Internals 101

Architectural Overview: Pointers in a Versioned World

1. The Two-Layer Storage Model

2. Git-Like Operations for Tables

3. Transactional Guarantees

Hard Problems & Trade-Offs

Go vs. Java: Choosing Our Tools

Designing for the Long Haul

Garbage Collection in a Multi-Branch World

Roadmap: What’s Next?

Lessons Learned

1. Leverage Your Primitives:

2. Balance Spec Compliance with Real-World Needs:

3. The Value of a Proper Design:

Conclusion

Watch lakeFS Iceberg tutorial

Need help getting started?

lakeFS

How We Built Our lakeFS Iceberg Catalog

Introduction: Why a Native Iceberg Catalog?

Iceberg Internals 101

Architectural Overview: Pointers in a Versioned World

1. The Two-Layer Storage Model

2. Git-Like Operations for Tables

3. Transactional Guarantees

Hard Problems & Trade-Offs

Go vs. Java: Choosing Our Tools

Designing for the Long Haul

Garbage Collection in a Multi-Branch World

Roadmap: What’s Next?

Lessons Learned

1. Leverage Your Primitives:

2. Balance Spec Compliance with Real-World Needs:

3. The Value of a Proper Design:

Conclusion

Related articles

Building Compliant and Reproducible ML Pipelines

lakeFS Top 10 Defining Product Milestones in 2025

Building a Data Center of Excellence for Modern Data Teams

Watch lakeFS Iceberg tutorial

lakeFS

Pick up the Slack with lakeFS