Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Best Practices

Best Practices Tutorials

The Power of Databricks SQL: A Practical Guide to Unified Data Analytics

The lakeFS team

In the universe of Databricks Lakehouse, Databricks SQL serves as a handy tool for querying and analyzing data. It lets SQL-savvy data analysts, data engineers, and other data practitioners extract insights without forcing them to write code. This improves access to data analytics, simplifying and speeding up the data analysis process.  But that’s not everything …

The Power of Databricks SQL: A Practical Guide to Unified Data Analytics Read More »

Best Practices Machine Learning Tutorials

ML Data Version Control and Reproducibility at Scale

Amit Kesarwani

Introduction In the ever-evolving landscape of machine learning (ML), data stands as the cornerstone upon which triumphant models are built. However, as ML projects expand and encompass larger and more complex datasets, the challenge of efficiently managing and controlling data at scale becomes more pronounced. These are the common conventional approaches used by the data …

ML Data Version Control and Reproducibility at Scale Read More »

Best Practices Data Engineering Tutorials

Databricks Unity Catalog: A Comprehensive Guide to Streamlining Your Data Assets

The lakeFS team

As data quantities increase and data sources diversify, teams are under pressure to implement comprehensive data catalog solutions. Databricks Unity Catalog is a uniform governance solution for all data and AI assets in your lakehouse on any cloud, including files, tables, machine learning models, and dashboards. The solution provides a consolidated solution for categorizing, organizing, …

Databricks Unity Catalog: A Comprehensive Guide to Streamlining Your Data Assets Read More »

Best Practices Data Engineering Tutorials

How Data Version Control Provides Data Lineage for Data Lakes

Iddo Avneri

One of the reasons behind the rise in data lakes’ adoption is their ability to handle massive amounts of data coming from diverse data sources, transform it at scale, and provide valuable insights. However, this capability comes at the price of complexity.  This is where data lineage helps. In this article, we review some basic …

How Data Version Control Provides Data Lineage for Data Lakes Read More »

Best Practices Product

Commit Graph – A Data Version Control Visualization

Oz Katz

In the world of data management and data version control, understanding the relationships between different versions of your data is crucial.  Just like in software development, where version control systems like Git help developers track changes in their codebase, data versioning tools such as lakeFS are indispensable for tracking changes in data lakes and object …

Commit Graph – A Data Version Control Visualization Read More »

Best Practices Product Tutorials

Dagster + lakeFS: How to Troubleshoot and Reproduce Data

Amit Kesarwani

Dagster is a cloud-native data pipeline orchestration tool for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. It is designed for developing and maintaining data assets. With Dagster, you declare—as Python functions—the data assets that you want to build. Dagster then helps you run your functions at …

Dagster + lakeFS: How to Troubleshoot and Reproduce Data Read More »

Best Practices Data Engineering Machine Learning

Data Mesh Architecture: Guide to Enterprise Data Architecture

The lakeFS team

In the traditional setup, organizations had a centralized infrastructure team responsible for managing data ownership across domains. But product-led companies started to approach this matter a little differently. Instead, they distribute the data ownership directly among producers (subject matter experts) using a data mesh architecture. This is a concept originally presented by Zhamak Dehghani in …

Data Mesh Architecture: Guide to Enterprise Data Architecture Read More »

Best Practices Tutorials

How to Migrate or Clone a lakeFS Repository: Step-by-Step Tutorial

Amit Kesarwani

Introduction If you want to migrate or clone repositories from a source lakeFS environment to a target lakeFS environment then follow this tutorial. Your source and target lakeFS environments can be running locally or in the cloud. You can also follow this tutorial if you want to migrate/clone a source repository to a target repository …

How to Migrate or Clone a lakeFS Repository: Step-by-Step Tutorial Read More »

Git for Data – lakeFS

  • Get Started
    Get Started