Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Tutorials

Machine Learning Tutorials

How to Build Data Pipelines in Databricks with Examples

Tal Sofer

Building a data pipeline is a smart move for data engineers in any organization. A strong data pipeline guarantees that the information is clean, consistent, and dependable. It automates discovering and fixing issues, ensuring high data quality and integrity and preventing your company from making poor decisions based on inaccurate data. This article dives into […]

Machine Learning Product Tutorials

lakectl local: How to work with lakeFS locally using Git

Oz Katz

The massive increase in generated data presents a serious challenge to organizations looking to unlock value from their data sets. Data practitioners have to deal with many consequences of the huge data volume, including manageability and collaboration. This is where data versioning can help. Data version control is crucial because it allows data teams to

Best Practices Data Engineering Tutorials

ETL Testing Tutorial with lakeFS: Step-by-Step Guide

Iddo Avneri

ETL testing is critical in integrating and migrating your data to a new system. It acts as a safety net for your data, assuring completeness, accuracy, and dependability to improve your decision-making abilities. ETL testing may be complex owing to the volume of data involved. Furthermore, the data is almost always varied, adding an extra

Data Engineering Machine Learning Tutorials

Building A Data Lake For The GenAI And ML Era

Einat Orr, PhD

Despite data technology advancements, many organizations still struggle to access outdated mainframe data. Most of the time, you’re looking at siloed data architecture that just doesn’t align with their strategic goals. At the same time, organizations are under pressure from their competitors. A good data strategy enables companies to go beyond function-specific and interdepartmental analytics

Machine Learning Tutorials

How to Toggle OpenAI Model Determinism

Amit Kesarwani

TL;DR In the previous blog, Introducing the LangChain lakeFS Loader, and sample notebook, we explained and demonstrated integration of lakeFS with LangChain and LLM models (specifically OpenAI models). In this blog, we will explore a new beta feature from OpenAI that enables reproducible responses from a model. Introduction Language models are Stochastic models (stochastic refers

Product Tutorials

lakeFS + Unity Catalog Integration: Step-by-Step Tutorial

Amit Kesarwani, Jonathan Rosenberg

Efficient data management is a critical component of any modern organization.  As data volumes grow and data sources become more diverse, the need for robust data catalog solutions becomes increasingly evident. Recognizing this need, lakeFS, an open-source data lake management platform, has integrated with Unity Catalog, a comprehensive data catalog solution by Databricks.  In this

Best Practices Product Tutorials

Introducing lakeFS Transactional Mirroring (Cross-Region Mirroring)

Ariel Shaqed (Scolnicov), Idan Novogroder, Guy Hardonag

What is mirroring We are pleased to announce a preview of a long-awaited lakeFS feature: transactional mirroring across regions. Mirroring builds on top of S3 Replication to provide a consistent view of your versioned data in other regions. Once configured, it allows creating mirrors in all of your regions. Each mirror of a source repository

Machine Learning Tutorials

lakeFS-spec: An Easy Way To Work With lakeFS From Python

Jan Willem Kleinrouweler, appliedAI, Max Mynter, appliedAI

TL;DR In this blog post, we will explore how to add data versioning to an ML project; a simple end-to-end rain prediction project for the Munich area. The data assets will be stored in lakeFS and we will use the lakeFS-spec Python package for easy interaction with lakeFS. Following model training with initial data, we

Data Engineering Machine Learning Product Tutorials

Introducing The New lakeFS Python Experience

Oz Katz, Nir Ozeri

Since its inception, lakeFS shipped with a full featured Python SDK. For each new version of lakeFS, this SDK is automatically generated, relying on the OpenAPI specification published by the given version. While this always ensured the Python SDK shipped with all possible features, the automatically generated code wasn’t always the nicest (or most Pythonic)

Data Engineering Machine Learning Tutorials

Unlocking Data Insights with Databricks Notebooks

Idan Novogroder

Databricks Notebooks are a popular tool for interacting with data using code and presenting findings across disciplines like data science, machine learning, and data engineering. Notebooks are, in fact, a key offering from Databricks for generating processes and collaborating with team members thanks to real-time multilingual coauthoring, automated versioning, and built-in data visualizations.  How exactly

Git for Data – lakeFS

  • Get Started
    Get Started
  • Who’s coming to Data+AI Summit? Meet the lakeFS team at Booth #69! Learn more about -

    lakeFS for Databricks
    +