Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Version your ML training data for easy reproducibility

Most data science and machine learning workflows are not linear. ML experimentation is an iterative process and you have to go back and forth between different components. Most of us experiment with different data labeling methods, data cleaning and pre-processing techniques and various feature selection methods during model training to arrive at an accurate model.

Thus, being able to reproduce a specific iteration of the ML experiment is important to achieve scalable and quality ML models. That is, capturing the version of training data, ML code and model artifacts at each iteration is mandatory. In order to efficiently version these ML experiments without duplicating your code, data and models, you should opt for a data versioning tool like lakeFS. lakeFS allows you to version all components of ML experiments without the need to keep multiple copies of them and saves your storage costs as a fringe benefit as well.

In this webinar, we will show you how to use lakeFS to intuitively and easily version your ML experiments and reproduce any specific iteration of the experiment as needed.

We will cover:

  1. Creating a basic ML experimentation framework with lakeFS (on Jupyter notebook)
  2. Reproducing ML components from a specific iteration of an experiment
  3. Building an intuitive, zero-maintenance experiments infrastructure with lakeFS


Iddo Avneri

VP Customer Success lakeFS

Vino SD

Developer Advocate

Watch webinar on demand

Git for Data – lakeFS

  • Get Started
    Get Started
  • Did you know that lakeFS is an official Databricks Technology Partner? Learn more about -

    lakeFS for Databricks