lakeFS Blog

Project

lakeFS in Search of a Role Model

Einat Orr, PhD.
September 21, 2021

Who needs a role model? When we first launched lakeFS in August of 2020, we asked ourselves a simple question: What does success look like? And how will we know we’re doing the right things to get there?   Of course with thousands of installations, a thriving user community, active developer contributions, and exponential growth on …

lakeFS in Search of a Role Model Read More »

Data Engineering

Thoughts on the Future of the Databricks Ecosystem

Paul Singman
September 8, 2021

Databricks has come a long way since growing out of a Berkeley Lab in 2013 with an open-source distributed computing framework called Spark. Fast forward eight years and in addition to the core Spark product, there are a dizzying number of new features in various stages of public preview within the Databricks platform. In case …

Thoughts on the Future of the Databricks Ecosystem Read More »

People Project

Building an Auto-Upgrading lakeFS Environment

Nirav Adunuthula
September 1, 2021

One look at the cheerful aquamarine axolotl resting atop the lakeFS homepage was all it took to assure me that an internship here would make this summer different than any other… This summer I had the opportunity to intern with the amazing developer team at Treeverse and work on the lakeFS project. This opportunity represents an important …

Building an Auto-Upgrading lakeFS Environment Read More »

Data Engineering

The Docker Everything Bagel™ – Spin Up A Local Data Stack

Paul Singman
August 25, 2021

Introduction An important part of developing an open source project like lakeFS is assisting and advising our users. When they run into an issue and feel pain, we want to feel that pain, too. Quite literally. This means recreating the environment, running the same code, and raising the same error. In complex, modern data stacks …

The Docker Everything Bagel™ – Spin Up A Local Data Stack Read More »

Data Engineering

Hive Metastore – Why It’s Still Here and What Can Replace It?

Einat Orr, PhD.
August 19, 2021

Hive & Hadoop — A Brief History Apache Hive burst onto the scene in 2010 as a component of the Hadoop ecosystem, when Hadoop was the novel and innovative way of doing big data analytics.  What Hive did was implement a SQL interface to Hadoop. Its architecture consisted of two main services: A Query Engine …

Hive Metastore – Why It’s Still Here and What Can Replace It? Read More »

Integrations

Seamlessly Sync Data Into Your lakeFS Repos With Airbyte

Itai Admi
August 5, 2021

New features in Airbyte and lakeFS make it easy to send data replicated by Airbyte into a lakeFS repo. See how to leverage this integration in your data pipelines! If you work in data, chances are you rely on replicating data between different systems to centralize it for analysis. Modern companies produce data from all …

Seamlessly Sync Data Into Your lakeFS Repos With Airbyte Read More »

Project

New in lakeFS: Data Retention Policies

Yoni Augarten, Guy Hardonag
August 9, 2021

“I can remember everything. That’s my curse, young man. It’s the greatest curse that’s ever been inflicted on the human race: memory.” — Jedediah Leland, Citizen Kane (1941) lakeFS makes data corruptions easy to avoid and fix by allowing you to travel back in time to any state of your data. This new capability has …

New in lakeFS: Data Retention Policies Read More »

Data Engineering

Making Sure Your Data Lifecycle Management Makes Sense

Paul Singman, Einat Orr, PhD.
July 15, 2021

What is Data Lifecycle Management Datasets are the foundational output of a data team. They do not appear out of thin air. No one has ever snapped their fingers and created an orders_history table. Instead, useful sets of data are created and maintained through a process that involves several predictable steps. Managing this process is …

Making Sure Your Data Lifecycle Management Makes Sense Read More »

LakeFS

  • Get Started
    Get Started