lakeFS Blog

Project

New in lakeFS: Data Retention Policies

Yoni Augarten, Guy Hardonag
July 26, 2021

“I can remember everything. That’s my curse, young man. It’s the greatest curse that’s ever been inflicted on the human race: memory.” — Jedediah Leland, Citizen Kane (1941) lakeFS makes data corruptions easy to avoid and fix by allowing you to travel back in time to any state of your data. This new capability has …

New in lakeFS: Data Retention Policies Read More »

Data Engineering

Making Sure Your Data Lifecycle Management Makes Sense

Paul Singman, Einat Orr, PhD.
July 15, 2021

What is Data Lifecycle Management Datasets are the foundational output of a data team. They do not appear out of thin air. No one has ever snapped their fingers and created an orders_history table. Instead, useful sets of data are created and maintained through a process that involves several predictable steps. Managing this process is …

Making Sure Your Data Lifecycle Management Makes Sense Read More »

Integrations Machine Learning

Build Reproducible Experiments with Kubeflow and lakeFS

Tal Sofer, Paul Singman
July 1, 2021

Introducing Kubeflow and lakeFS Kubeflow is a cloud-native ML platform that simplifies the training and deployment of machine learning pipelines on Kubernetes. An ML project using Kubeflow will consist of isolated components for each stage of the ML lifecycle. And each component of a Kubeflow pipeline is packaged as a Docker image and executed in a …

Build Reproducible Experiments with Kubeflow and lakeFS Read More »

Data Engineering Project

Advancing lakeFS: Version Data At Scale With Spark

Tal Sofer
June 23, 2021

Combining lakeFS and Spark provides a new standard for scale and elasticity to distributed data pipelines. When integrating two technologies, the aim should be to expose the strengths of each as much as possible. With this philosophy in mind, we are excited to announce the beta release of the lakeFS FileSystem! This native Hadoop FileSystem …

Advancing lakeFS: Version Data At Scale With Spark Read More »

Data Engineering Integrations

Air & Water: The Airflow and lakeFS Integration

Itai Admi
May 27, 2021

Today we are excited to announce the official release of the lakeFS Airflow provider! What this package does is allow you to easily integrate lakeFS functionality to your Airflow DAGs. The library is published on PyPI so it can easily be installed in your project via the command: pip install airflow-provider-lakefs Once installed, you are …

Air & Water: The Airflow and lakeFS Integration Read More »

Data Engineering

Solving Data Reproducibility

Paul Singman
May 24, 2021

Debugging an issue is never fun, but why make it harder? In this post, we show how reproducing data is possible whether interacting with a single file, entire table, or data repository. Introducing Data Reproducibility There are two types of issues in the world — reproducible and unreproducible.  A reproducible issue is one where the original conditions for …

Solving Data Reproducibility Read More »

LakeFS

  • Get Started
    Get Started