Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Data Versioning

Data Engineering Machine Learning

Jupyter Notebook & 10 Alternatives: Data Notebook Review [2023]

The lakeFS team

The tech industry responded to the needs of data practitioners with various IDE solutions for developing code and presenting findings in a data science and machine learning context. One of the go-to solutions today is Jupyter Notebook, an open-source tool that has gained a lot of traction among data science folks and beyond.  Although Jupyter …

Jupyter Notebook & 10 Alternatives: Data Notebook Review [2023] Read More »

Data Engineering Machine Learning Product

Scalable Data Version Control – Getting the Best of Both Worlds with lakeFS

Oz Katz

There are several tools in the data version control space, all looking to solve similar problems. Two of the leaders are lakeFS and DVC. In this post, I am going to give an overview of how each has been designed so as to provide a basis for understanding their relative abilities to scale. Being a …

Scalable Data Version Control – Getting the Best of Both Worlds with lakeFS Read More »

Best Practices Product Tutorials

Dagster + lakeFS: How to Troubleshoot and Reproduce Data

Amit Kesarwani

Dagster is a cloud-native data pipeline orchestration tool for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. It is designed for developing and maintaining data assets. With Dagster, you declare—as Python functions—the data assets that you want to build. Dagster then helps you run your functions at …

Dagster + lakeFS: How to Troubleshoot and Reproduce Data Read More »

Data Engineering Machine Learning

How To Improve ML Pipeline Development With Reproducibility

Itai Admi

The MLOps domain is spreading at an accelerating pace. In recent years, we’ve seen more ML products and MLOps tools than we probably need. Today, there are hundreds of tools trying to solve a bunch of problems in different ways, with some of them promising end-to-end solutions. This usually makes data practitioners confused when they …

How To Improve ML Pipeline Development With Reproducibility Read More »

Best Practices Tutorials

How to Migrate or Clone a lakeFS Repository: Step-by-Step Tutorial

Amit Kesarwani

Introduction If you want to migrate or clone repositories from a source lakeFS environment to a target lakeFS environment then follow this tutorial. Your source and target lakeFS environments can be running locally or in the cloud. You can also follow this tutorial if you want to migrate/clone a source repository to a target repository …

How to Migrate or Clone a lakeFS Repository: Step-by-Step Tutorial Read More »

Data Engineering

Analytical Data: Guide to Enterprise Data Architecture

The lakeFS team

Organizations can accomplish more with their data than ever before thanks to advances in analytical data processing and data democratization initiatives led by the spread of visualization tools, low-code and no-code solutions, and innovations like data mesh. Advances in compute power, innovative data processing methods, and broader cloud adoption have accelerated these trends, placing data …

Analytical Data: Guide to Enterprise Data Architecture Read More »

Data Engineering Machine Learning Tutorials

Backfilling Data: A Foolproof Guide to Managing Historical Data

Iddo Avneri

If you work with a smaller dataset or do one-off jobs, the way you manage backfills isn’t that crucial. But what if you face constantly growing datasets with billions to trillions of records? Your backfilling data strategy will have a much bigger impact. When dealing with modern data pipelines on such a scale, it’s key …

Backfilling Data: A Foolproof Guide to Managing Historical Data Read More »

Best Practices Tutorials

Version Control Data Pipelines Using the Medallion Architecture

Iddo Avneri

A step by step guide to running pipelines on Bronze, Silver and Gold layers with lakeFS Introduction The Medallion Architecture is a software design pattern that organizes a data pipeline into three distinct tiers based on functionality: bronze, silver, and gold. The bronze tier represents the core functionality of the system, while the silver and …

Version Control Data Pipelines Using the Medallion Architecture Read More »

Git for Data – lakeFS

  • Get Started
    Get Started