Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Machine Learning

Best Practices Machine Learning Tutorials

ML Data Version Control and Reproducibility at Scale

Amit Kesarwani

Introduction In the ever-evolving landscape of machine learning (ML), data stands as the cornerstone upon which triumphant models are built. However, as ML projects expand and encompass larger and more complex datasets, the challenge of efficiently managing and controlling data at scale becomes more pronounced. These are the common conventional approaches used by the data …

ML Data Version Control and Reproducibility at Scale Read More »

Data Engineering Machine Learning

Jupyter Notebook & 10 Alternatives: Data Notebook Review [2023]

The lakeFS team

The tech industry responded to the needs of data practitioners with various IDE solutions for developing code and presenting findings in a data science and machine learning context. One of the go-to solutions today is Jupyter Notebook, an open-source tool that has gained a lot of traction among data science folks and beyond.  Although Jupyter …

Jupyter Notebook & 10 Alternatives: Data Notebook Review [2023] Read More »

Data Engineering Machine Learning Product

Scalable Data Version Control – Getting the Best of Both Worlds with lakeFS

Oz Katz

There are several tools in the data version control space, all looking to solve similar problems. Two of the leaders are lakeFS and DVC. In this post, I am going to give an overview of how each has been designed so as to provide a basis for understanding their relative abilities to scale. Being a …

Scalable Data Version Control – Getting the Best of Both Worlds with lakeFS Read More »

Data Engineering Machine Learning

12 Vector Databases For 2023: A Review

The lakeFS team

Vector databases first emerged a few years ago to power a new generation of search engines based on neural networks. Today, they play a new role: helping organizations deploy applications based on large language models like GPT4. Vector databases differ from standard relational databases, such as PostgreSQL, which were built to store tabular data in …

12 Vector Databases For 2023: A Review Read More »

Data Engineering Machine Learning Thought Leadership

Delta-rs, Apache Arrow, Polars, WASM: Is Rust the Future of Analytics?

Oz Katz

This post is a recap of a talk I gave at this year’s Data + AI Summit about why I believe the Rust Programming Language and related novel technologies such as WebAssembly will play a large part in the data ecosystem in the coming years. The talk covers: The “present” of analytics (from my perspective, …

Delta-rs, Apache Arrow, Polars, WASM: Is Rust the Future of Analytics? Read More »

Data Engineering Machine Learning

Data Governance: Guide to Enterprise Data Architecture

The lakeFS team, Einat Orr, PhD

Organizations need data governance for many reasons, not just to comply with a rising number of data privacy and protection rules, such as the GDPR of the European Union and the California Consumer Privacy Act (CCPA).  A lack of it can cause more pain than a fine. One of the most impactful areas of data …

Data Governance: Guide to Enterprise Data Architecture Read More »

Git for Data – lakeFS

  • Get Started
    Get Started