lakeFS Blog

Integrations

Seamlessly Sync Data Into Your lakeFS Repos With Airbyte

Itai Admi
August 5, 2021

New features in Airbyte and lakeFS make it easy to send data replicated by Airbyte into a lakeFS repo. See how to leverage this integration in your data pipelines! If you work in data, chances are you rely on replicating data between different systems to centralize it for analysis. Modern companies produce data from all …

Seamlessly Sync Data Into Your lakeFS Repos With Airbyte Read More »

Project

New in lakeFS: Data Retention Policies

Yoni Augarten, Guy Hardonag
August 9, 2021

“I can remember everything. That’s my curse, young man. It’s the greatest curse that’s ever been inflicted on the human race: memory.” — Jedediah Leland, Citizen Kane (1941) lakeFS makes data corruptions easy to avoid and fix by allowing you to travel back in time to any state of your data. This new capability has …

New in lakeFS: Data Retention Policies Read More »

Data Engineering

Making Sure Your Data Lifecycle Management Makes Sense

Paul Singman, Einat Orr, PhD.
July 15, 2021

What is Data Lifecycle Management Datasets are the foundational output of a data team. They do not appear out of thin air. No one has ever snapped their fingers and created an orders_history table. Instead, useful sets of data are created and maintained through a process that involves several predictable steps. Managing this process is …

Making Sure Your Data Lifecycle Management Makes Sense Read More »

Integrations Machine Learning

Build Reproducible Experiments with Kubeflow and lakeFS

Tal Sofer, Paul Singman
July 1, 2021

Introducing Kubeflow and lakeFS Kubeflow is a cloud-native ML platform that simplifies the training and deployment of machine learning pipelines on Kubernetes. An ML project using Kubeflow will consist of isolated components for each stage of the ML lifecycle. And each component of a Kubeflow pipeline is packaged as a Docker image and executed in a …

Build Reproducible Experiments with Kubeflow and lakeFS Read More »

Data Engineering Project

Advancing lakeFS: Version Data At Scale With Spark

Tal Sofer
June 23, 2021

Combining lakeFS and Spark provides a new standard for scale and elasticity to distributed data pipelines. When integrating two technologies, the aim should be to expose the strengths of each as much as possible. With this philosophy in mind, we are excited to announce the beta release of the lakeFS FileSystem! This native Hadoop FileSystem …

Advancing lakeFS: Version Data At Scale With Spark Read More »

Data Engineering Integrations

Air & Water: The Airflow and lakeFS Integration

Itai Admi
May 27, 2021

Today we are excited to announce the official release of the lakeFS Airflow provider! What this package does is allow you to easily integrate lakeFS functionality to your Airflow DAGs. The library is published on PyPI so it can easily be installed in your project via the command: pip install airflow-provider-lakefs Once installed, you are …

Air & Water: The Airflow and lakeFS Integration Read More »

LakeFS

  • Get Started
    Get Started