Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Data Engineering

Data Engineering Machine Learning Product

Scalable Data Version Control – Getting the Best of Both Worlds with lakeFS

Oz Katz

There are several tools in the data version control space, all looking to solve similar problems. Two of the leaders are lakeFS and DVC. In this post, I am going to give an overview of how each has been designed so as to provide a basis for understanding their relative abilities to scale. Being a …

Scalable Data Version Control – Getting the Best of Both Worlds with lakeFS Read More »

Product

Mixing Metadata, Air and Water: Use the lakeFS Airflow Provider to Link Airflow Execution to lakeFS Data

Ariel Shaqed (Scolnicov)

Introduction “How do I integrate X with lakeFS” is an ever-green question on lakeFS Slack. lakeFS takes a “tooling-first” strategy to data management: it slots into your existing lineup of tools. So a significant part of our work on lakeFS is devoted to leveraging lakeFS and these other tools to improve these integrations. Our latest …

Mixing Metadata, Air and Water: Use the lakeFS Airflow Provider to Link Airflow Execution to lakeFS Data Read More »

Best Practices Product Tutorials

Dagster + lakeFS: How to Troubleshoot and Reproduce Data

Amit Kesarwani

Dagster is a cloud-native data pipeline orchestration tool for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. It is designed for developing and maintaining data assets. With Dagster, you declare—as Python functions—the data assets that you want to build. Dagster then helps you run your functions at …

Dagster + lakeFS: How to Troubleshoot and Reproduce Data Read More »

Best Practices Data Engineering Machine Learning

Data Mesh Architecture: Guide to Enterprise Data Architecture

The lakeFS team

In the traditional setup, organizations had a centralized infrastructure team responsible for managing data ownership across domains. But product-led companies started to approach this matter a little differently. Instead, they distribute the data ownership directly among producers (subject matter experts) using a data mesh architecture. This is a concept originally presented by Zhamak Dehghani in …

Data Mesh Architecture: Guide to Enterprise Data Architecture Read More »

Data Engineering

Analytical Data: Guide to Enterprise Data Architecture

The lakeFS team

Organizations can accomplish more with their data than ever before thanks to advances in analytical data processing and data democratization initiatives led by the spread of visualization tools, low-code and no-code solutions, and innovations like data mesh. Advances in compute power, innovative data processing methods, and broader cloud adoption have accelerated these trends, placing data …

Analytical Data: Guide to Enterprise Data Architecture Read More »

Git for Data – lakeFS

  • Get Started
    Get Started