Best Practices Machine Learning Tutorials

Building an ML Experimentation Platform for Easy Reproducibility Using lakeFS

Vino SD

April 21, 2023

MLOps is mostly data engineering. As organizations ride past the hype cycle of MLOps, we realize there is significant overlap between MLOps and data engineering. As ML engineers, we spend most of our time collecting, verifying, pre-processing, and engineering features from data before we can even begin training models. Only 5% of developing and deploying […]

Machine Learning Product

Troubleshoot and Reproduce Data with Apache Airflow

Iddo Avneri

December 6, 2022

Apache airflow enables you to build multistep workflows across multiple technologies. The programmatic approach, allowing you to schedule and monitor workflows, helps users build complicated ETLs on their data that will be difficult to achieve automatically otherwise.This enabled the evolution of ETLs from simple single steps to complicated, parallelized, multi steps advance transformations: The challenge

Machine Learning

lakeFS Community: Leonard Aukea nominated for Machine Learning Professional of the year!

Adi Polak

November 16, 2022

Our community is full of people with incredible skills and know-how. And this nomination proves us right! Our community member @Leonard Aukea has been nominated for Machine Learning Professional of the year as part of the Nordic DAIR Awards. Congratulations, Leonard! Who is Leonard? Leonard Aukea has been Heading Machine Learning Engineering and Operations at

Data Engineering Machine Learning Product

How to Develop Spark ETL Pipelines in Isolation

Amit Kesarwani, Vino SD, Iddo Avneri

October 26, 2022

You’re bound to ask yourself this question at some point: Do I need to test the Spark ETLs I’m developing? The answer is yes; you certainly should – and not just with unit testing but also integration, performance, load, and regression testing. Naturally, the scale and complexity of your data set matters a lot, so

Data Engineering Machine Learning

Data+AI Summit 2022 Recap: Top 6 Industry trends and 9 major announcements!

Vino SD

July 18, 2022

It was 27th June 2022. San Francisco was bustling with 5000+ data folks from around the world to attend the Data & AI summit live after two years. Four days packed with tons of information from Keynotes, Speakers, Panels, Expo booths and Databricks trainings. Flurry of new product announcements followed. lakeFS cloud launch, Delta lake

Data Engineering Machine Learning Product

Proudly announcing lakeFS Cloud

Einat Orr, PhD, Oz Katz

June 27, 2022

What is lakeFS? As data practitioners, we use many different terms to talk about what we do – we call it business intelligence, analytics, data pipelines, or insights. But there’s one term that captures what we do really well: delivering products. When we were leading a large R&D organization, we couldn’t help but wonder about

Data Engineering Machine Learning

lakeFS – Data Versioning at Scale

Paul Singman

October 12, 2021

If you think about it, lakeFS is about two things — version control and big data. We see ourselves as bringing version control to big data. This bridges a workflow gap that currently exists when working with data and working with code. This gap is purely artificial — there’s no conceptual reason why different workflows should be required for

Machine Learning Product

Build Reproducible Experiments with Kubeflow and lakeFS

Tal Sofer, Paul Singman

June 30, 2021

Introducing Kubeflow and lakeFS Kubeflow is a cloud-native ML platform that simplifies the training and deployment of machine learning pipelines on Kubernetes. An ML project using Kubeflow will consist of isolated components for each stage of the ML lifecycle. And each component of a Kubeflow pipeline is packaged as a Docker image and executed in a

Machine Learning

Pick up the Slack with lakeFS