Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Product

Best Practices Product

Commit Graph – A Data Version Control Visualization

Oz Katz

In the world of data management and data version control, understanding the relationships between different versions of your data is crucial.  Just like in software development, where version control systems like Git help developers track changes in their codebase, data versioning tools such as lakeFS are indispensable for tracking changes in data lakes and object …

Commit Graph – A Data Version Control Visualization Read More »

Product

Data Garbage Collection: How lakeFS Keeps a Clean House (Lake)

Nir Ozeri

In today’s data-driven world, managing vast amounts of data efficiently is crucial for organizations of all sizes. As data lakes and object storage systems popularity is on the rise, the need for robust data versioning and governance solutions has grown. Data retention and storage optimization has become an increasingly complex task, even more so when …

Data Garbage Collection: How lakeFS Keeps a Clean House (Lake) Read More »

Data Engineering Machine Learning Product

Scalable Data Version Control – Getting the Best of Both Worlds with lakeFS

Oz Katz

There are several tools in the data version control space, all looking to solve similar problems. Two of the leaders are lakeFS and DVC. In this post, I am going to give an overview of how each has been designed so as to provide a basis for understanding their relative abilities to scale. Being a …

Scalable Data Version Control – Getting the Best of Both Worlds with lakeFS Read More »

Product

Mixing Metadata, Air and Water: Use the lakeFS Airflow Provider to Link Airflow Execution to lakeFS Data

Ariel Shaqed (Scolnicov)

Introduction “How do I integrate X with lakeFS” is an ever-green question on lakeFS Slack. lakeFS takes a “tooling-first” strategy to data management: it slots into your existing lineup of tools. So a significant part of our work on lakeFS is devoted to leveraging lakeFS and these other tools to improve these integrations. Our latest …

Mixing Metadata, Air and Water: Use the lakeFS Airflow Provider to Link Airflow Execution to lakeFS Data Read More »

Best Practices Product Tutorials

Dagster + lakeFS: How to Troubleshoot and Reproduce Data

Amit Kesarwani

Dagster is a cloud-native data pipeline orchestration tool for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. It is designed for developing and maintaining data assets. With Dagster, you declare—as Python functions—the data assets that you want to build. Dagster then helps you run your functions at …

Dagster + lakeFS: How to Troubleshoot and Reproduce Data Read More »

Data Engineering Machine Learning Product

lakeFS + Unity Catalog Integration

Oz Katz

Metadata Management for the Modern Data Lakehouse Efficient data management is a critical component of any modern organization.  As data volumes grow and data sources become more diverse, the need for robust data catalog solutions becomes increasingly evident. Recognizing this need, lakeFS, an open-source data lake management platform, has recently integrated with Unity Data Catalog, …

lakeFS + Unity Catalog Integration Read More »

Data Engineering Product

lakeFS ♥️ Apache Iceberg

Robin Moffatt

lakeFS now directly supports Apache Iceberg tables. Using straightforward table identifiers you can switch between branches when reading and writing data: lakeFS itself remains format agnostic, happily providing version control for data whether binary, tabular, or just a lump of text. The benefit of this new functionality is that the same table that you create …

lakeFS ♥️ Apache Iceberg Read More »

Data Engineering Machine Learning Product

lakeFS Cloud is Now Self-Service on Microsoft Azure

Einat Orr, PhD

We are pleased to announce that lakeFS Cloud is now available as a self service on Azure. lakeFS Cloud is a fully-managed lakeFS platform, providing version control for your data lake. As well as being secure and scalable, it includes enterprise features such as Single Sign On (SSO), managed garbage collection, and role-based access control …

lakeFS Cloud is Now Self-Service on Microsoft Azure Read More »

Git for Data – lakeFS

  • Get Started
    Get Started