lakeFS Blog

Project

In the Realm of the New Open Data Stack: Joining the lakeFS adventure

Adi Polak
January 18, 2022

Open Source Is Everywhere! For the past decade, open-source has made a home in the ever-evolving data stack zoo known as the Hadoop ecosystem. To keep everything in order, ZooKeeper was tasked with coordinating the zoo resources, and all the animals were happy. Until one day, the zoo got new creatures to take care of …

In the Realm of the New Open Data Stack: Joining the lakeFS adventure Read More »

Data Engineering Integrations

The Everything Bagel II: Versioned Data Lake Tables with lakeFS and Trino

Paul Singman
December 20, 2021

Introduction: Dockerize Your Data Pipeline I can remember times when my company started using a new technology — be it Redis, Kafka, or Spark — and in order to try it out I found myself staring at a screen like this: At the time I thought nothing of doing this. And even wore it as a badge of pride …

The Everything Bagel II: Versioned Data Lake Tables with lakeFS and Trino Read More »

Data Engineering

The Guide to Data Versioning

Paul Singman
December 2, 2021

“I have never lied to you, I have always told you some version of the truth.” “The truth doesn’t have versions, okay?” — Something’s Gotta Give (2003) Jack Nicholson and Diane Keaton discuss data versioning in Something’s Gotta Give. Table of Contents Introduction A version of something is defined as “a particular form in which some details are different …

The Guide to Data Versioning Read More »

Data Engineering

Data Versioning – Does It Mean What You Think It Means?

Einat Orr, PhD.
November 24, 2021

Introduction When we first thought about a tagline for our open source project lakeFS, we instinctively gravitated to terms like “Data versioning”, “Manage data the way you manage code”, “Git for data”, or any variation of the three that is grammatically correct.  We were very pleased with ourselves for 5 minutes, or maybe 7, before …

Data Versioning – Does It Mean What You Think It Means? Read More »

Data Engineering Hive Metastore

Takeaways From the Future of Metadata After Hive Metastore Roundtable

Paul Singman
November 16, 2021

Overview of Hive’s Metastore Let’s get right into it. This is not an objective recap of every topic covered at the Future of Metadata After Hive Roundtable last week. But it is a summary of what I found most interesting from the discussion between panelists Lior Ebel, Ryan Blue, Seshu Adunuthula and host Oz Katz. Watch the full talk below! …

Takeaways From the Future of Metadata After Hive Metastore Roundtable Read More »

Integrations

dbt Tests – Create Staging Environments for Flawless Data CI/CD

Guy Hardonag, Paul Singman
November 3, 2021

Recently, we’ve heard from several community members experimenting with new development workflows using lakeFS and dbt.  The timing isn’t surprising given dbt’s more recent support of big data compute tools like Spark and Trino that are some of the most commonly-used technologies by lakeFS users managing a data lake over an object store. The combination …

dbt Tests – Create Staging Environments for Flawless Data CI/CD Read More »

Project

lakeFS Community Call Recap – Oct. 2021

Paul Singman
December 2, 2021

Last week we held another lakeFS Community Call! We believe these calls are invaluable opportunities to have direct dialogue with our users on all things lakeFS. Oz covered important new lakeFS functionality, previewed what’s coming soon from the roadmap, and also shared two exciting updates from the community. Let’s recap! 6 Important lakeFS Releases 1. …

lakeFS Community Call Recap – Oct. 2021 Read More »

Data Engineering

3 Ways to Add Data to lakeFS

Paul Singman
October 26, 2021

Few people start using lakeFS without first having some data collected. Consequently, it is common that after getting it up and running, one of the first things people do is import their existing data to lakeFS. There isn’t a one-size-fits-all approach for doing this. Instead, there are ways that work great for a single file, …

3 Ways to Add Data to lakeFS Read More »

Go

Building Rich CLI Applications with Go’s Built-in Templating

Barak Amar
October 20, 2021

Overview The templating package text/template implements data-driven templates for generating textual output. Although we do not benefit from executing the template output more than once, we found it easy to use and helpful for outputting text with colors, marshaling data, and rendering tabular information. By mapping additional functions by name, it is possible to extend …

Building Rich CLI Applications with Go’s Built-in Templating Read More »

LakeFS

  • Get Started
    Get Started