Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Machine Learning

Best Practices Machine Learning Tutorials

Building an ML Experimentation Platform for Easy Reproducibility Using lakeFS

Vino SD

MLOps is mostly data engineering. As organizations ride past the hype cycle of MLOps, we realize there is significant overlap between MLOps and data engineering. As ML engineers, we spend most of our time collecting, verifying, pre-processing, and engineering features from data before we can even begin training models.  Only 5% of developing and deploying […]

Machine Learning Product

Troubleshoot and Reproduce Data with Apache Airflow

Iddo Avneri

Apache airflow enables you to build multistep workflows across multiple technologies. The programmatic approach, allowing you to schedule and monitor workflows, helps users build complicated ETLs on their data that will be difficult to achieve automatically otherwise.This enabled the evolution of ETLs from simple single steps to complicated, parallelized, multi steps advance transformations: The challenge

Data Engineering Machine Learning Product

How to Develop Spark ETL Pipelines in Isolation

Amit Kesarwani, Vino SD, Iddo Avneri

You’re bound to ask yourself this question at some point: Do I need to test the Spark ETLs I’m developing? The answer is yes; you certainly should – and not just with unit testing but also integration, performance, load, and regression testing. Naturally, the scale and complexity  of your data set matters a lot, so

Data Engineering Machine Learning Product

Proudly announcing lakeFS Cloud

Einat Orr, PhD, Oz Katz

What is lakeFS? As data practitioners, we use many different terms to talk about what we do – we call it business intelligence, analytics, data pipelines, or insights. But there’s one term that captures what we do really well: delivering products.  When we were leading a large R&D organization, we couldn’t help but wonder about

Data Engineering Machine Learning

lakeFS – Data Versioning at Scale

Paul Singman

If you think about it, lakeFS is about two things — version control and big data. We see ourselves as bringing version control to big data. This bridges a workflow gap that currently exists when working with data and working with code.  This gap is purely artificial — there’s no conceptual reason why different workflows should be required for

Machine Learning Product

Build Reproducible Experiments with Kubeflow and lakeFS

Tal Sofer, Paul Singman

Introducing Kubeflow and lakeFS Kubeflow is a cloud-native ML platform that simplifies the training and deployment of machine learning pipelines on Kubernetes. An ML project using Kubeflow will consist of isolated components for each stage of the ML lifecycle. And each component of a Kubeflow pipeline is packaged as a Docker image and executed in a

lakeFS