lakeFS Blog

Go

Managing Multiple Go Versions with Go

Barak Amar
May 16, 2021

As a user of the Go programming language, I’ve found it useful to enable the running multiple versions within a single project. If this is something you’ve tried or have considered, great! In this post I’ll present the when and the how of enabling multiple Go versions. Finally, we’ll conclude with a discussion of why …

Managing Multiple Go Versions with Go Read More »

Data Engineering Project

The State of Data Engineering in 2021

Einat Orr, PhD.
June 1, 2021

Let’s start with the obvious: the lakeFS project doesn’t exist in isolation. It belongs to a larger ecosystem of data engineering tools and technologies adjacent and complementary to the problems we are solving. What better way to visualize our place in this ecosystem, I thought, than by creating a cross-sectional LUMAscape to depict it. What’s …

The State of Data Engineering in 2021 Read More »

Project

Concrete Graveler: Splitting for Reuse

Ariel Shaqed (Scolnicov)
May 19, 2021

Welcome to another episode “Concrete Graveler”, our deep-dive into the implementation of Graveler, the committed object storage for lakeFS. Graveler is our versioned object store, inspired by Git. It is designed to store orders of magnitude more objects than Git does.  The last episode focused on how we store a single commit — a snapshot …

Concrete Graveler: Splitting for Reuse Read More »

Data Engineering

Hudi, Iceberg and Delta Lake: Data Lake Table Formats Compared

Oz Katz
May 7, 2021

Introduction When building a data lake, there is perhaps no more consequential decision than the format data will be stored in. The outcome will have a direct effect on its performance, usability, and compatibility. It is inspiring that by simply changing the format data is stored in, we can unlock new functionality and improve the …

Hudi, Iceberg and Delta Lake: Data Lake Table Formats Compared Read More »

People

Why I’m Joining lakeFS

Paul Singman
April 6, 2021

Thoughts on a personal journey into the world of developer advocacy at an open-source data project. In March of 2021, I chose to leave the data team at Equinox Media and join a nascent open-source project lakeFS as the first developer advocate. In this post, I share a few reasons why I’m excited about starting this …

Why I’m Joining lakeFS Read More »

Data Engineering

3 Data Lake Anti-Patterns to Avoid

Paul Singman
May 19, 2021

Rid yourself of these troubling habits and start the journey towards data lake mastery! Introduction Data lakes offer tantalizing performance upside, which is a major reason for their high rate of adoption. Sometimes though, the promise of technological performance can overshadow an unpleasant developer experience. This is troublesome since I believe the developer experience is as …

3 Data Lake Anti-Patterns to Avoid Read More »

Data Engineering

Data Lakes: The Definitive Guide

Paul Singman
May 27, 2021

What is a Data Lake? A data lake is a system of technologies that allow for the querying of data in file or blob objects.  When employed effectively, they enable the analysis of structured and unstructured data assets at tremendous scale and cost-efficiency. The number of organizations employing data lake architectures has increased exponentially since …

Data Lakes: The Definitive Guide Read More »

Project

Power Amazon EMR Applications with Git-like Operations Using lakeFS

Itai Admi
May 19, 2021

This article will provide a detailed explanation on how to use lakeFS with Amazon EMR. Today it’s common to manage a data lake using cloud object stores like AWS S3, Azure Blob Storage, or Google Cloud Storage as the underlying storage service. Each cloud provider offers a set of managed services to simplify the way …

Power Amazon EMR Applications with Git-like Operations Using lakeFS Read More »

Data Engineering Project

lakeFS Hooks: Implementing CI/CD for Data using Pre-merge Hooks

Oz Katz
March 2, 2021

Continuous integration of data is the process of exposing data to consumers only after ensuring it adheres to best practices such as format, schema, and PII governance. Continuous deployment of data ensures the quality of data at each step of a production pipeline. In this blog, I will present lakeFS’s web hooks, and showcase a …

lakeFS Hooks: Implementing CI/CD for Data using Pre-merge Hooks Read More »

LakeFS

  • Get Started
    Get Started