Project

Project

lakeFS – Data Versioning at Scale

Paul Singman
October 12, 2021

If you think about it, lakeFS is about two things — version control and big data. We see ourselves as bringing version control to big data. This bridges a workflow gap that currently exists when working with data and working with code.  This gap is purely artificial — there’s no conceptual reason why different workflows should be required for …

lakeFS – Data Versioning at Scale Read More »

Project

lakeFS in Search of a Role Model

Einat Orr, PhD.
September 21, 2021

Who needs a role model? When we first launched lakeFS in August of 2020, we asked ourselves a simple question: What does success look like? And how will we know we’re doing the right things to get there?   Of course with thousands of installations, a thriving user community, active developer contributions, and exponential growth on …

lakeFS in Search of a Role Model Read More »

People Project

Building an Auto-Upgrading lakeFS Environment

Nirav Adunuthula
September 1, 2021

One look at the cheerful aquamarine axolotl resting atop the lakeFS homepage was all it took to assure me that an internship here would make this summer different than any other… This summer I had the opportunity to intern with the amazing developer team at Treeverse and work on the lakeFS project. This opportunity represents an important …

Building an Auto-Upgrading lakeFS Environment Read More »

Project

New in lakeFS: Data Retention Policies

Yoni Augarten, Guy Hardonag
August 9, 2021

“I can remember everything. That’s my curse, young man. It’s the greatest curse that’s ever been inflicted on the human race: memory.” — Jedediah Leland, Citizen Kane (1941) lakeFS makes data corruptions easy to avoid and fix by allowing you to travel back in time to any state of your data. This new capability has …

New in lakeFS: Data Retention Policies Read More »

Data Engineering Project

Advancing lakeFS: Version Data At Scale With Spark

Tal Sofer
June 23, 2021

Combining lakeFS and Spark provides a new standard for scale and elasticity to distributed data pipelines. When integrating two technologies, the aim should be to expose the strengths of each as much as possible. With this philosophy in mind, we are excited to announce the beta release of the lakeFS FileSystem! This native Hadoop FileSystem …

Advancing lakeFS: Version Data At Scale With Spark Read More »

Data Engineering Project

The State of Data Engineering in 2021

Einat Orr, PhD.
June 1, 2021

Let’s start with the obvious: the lakeFS project doesn’t exist in isolation. It belongs to a larger ecosystem of data engineering tools and technologies adjacent and complementary to the problems we are solving. What better way to visualize our place in this ecosystem, I thought, than by creating a cross-sectional LUMAscape to depict it. What’s …

The State of Data Engineering in 2021 Read More »

Project

Concrete Graveler: Splitting for Reuse

Ariel Shaqed (Scolnicov)
May 19, 2021

Welcome to another episode “Concrete Graveler”, our deep-dive into the implementation of Graveler, the committed object storage for lakeFS. Graveler is our versioned object store, inspired by Git. It is designed to store orders of magnitude more objects than Git does.  The last episode focused on how we store a single commit — a snapshot …

Concrete Graveler: Splitting for Reuse Read More »

Project

Power Amazon EMR Applications with Git-like Operations Using lakeFS

Itai Admi
May 19, 2021

This article will provide a detailed explanation on how to use lakeFS with Amazon EMR. Today it’s common to manage a data lake using cloud object stores like AWS S3, Azure Blob Storage, or Google Cloud Storage as the underlying storage service. Each cloud provider offers a set of managed services to simplify the way …

Power Amazon EMR Applications with Git-like Operations Using lakeFS Read More »

LakeFS

  • Get Started
    Get Started