Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Tutorials

Tutorials

Introducing the lakeFS Video Tutorial Series

Paul Singman

As a developer advocate, it is my responsibility to ensure that understanding how to use lakeFS is simple for users. To this end I’m excited to announce the initial release of the lakeFS Tutorial Series on Youtube! For the initial drop there are three videos that cover how to: Install lakeFS locally Create a Repository (and add […]

Best Practices Tutorials

Go Versions: Manage Multiple Go Versions with Go

Barak Amar

Updated on April 5, 2022 As a user of the Go programming language, I’ve found it useful to enable the running multiple versions within a single project. If this is something you’ve tried or have considered, great! In this post I’ll present the when and the how of enabling multiple Go versions. Finally, we’ll conclude

Data Engineering Tutorials

Concrete Graveler: Splitting for Reuse

Ariel Shaqed (Scolnicov)

Welcome to another episode “Concrete Graveler”, our deep-dive into the implementation of Graveler, the committed object storage for lakeFS. Graveler is our versioned object store, inspired by Git. It is designed to store orders of magnitude more objects than Git does.  The last episode focused on how we store a single commit — a snapshot

Product Tutorials

Power Amazon EMR Applications with Git-like Operations Using lakeFS

Itai Admi

This article will provide a detailed explanation on how to use lakeFS with Amazon EMR. Today it’s common to manage a data lake using cloud object stores like AWS S3, Azure Blob Storage, or Google Cloud Storage as the underlying storage service. Each cloud provider offers a set of managed services to simplify the way

Best Practices Data Engineering Tutorials

Concrete Graveler: Committing Data to Pebble SSTables

Ariel Shaqed (Scolnicov)

  Introduction In our recent version of lakeFS, we switched to base metadata storage on immutable files stored on S3 and other common object stores.  Our design is inspired by Git, but for object stores rather than filesystems, and with (much) larger repositories holding machine-generated commits. The design document is informative but by nature omits

Best Practices Tutorials

Working with Embed in Go 1.16 Version

Barak Amar

The new Golang v1.16 embed directive helps us keep a single binary and bundle out static content. This post will cover how to work with embed directive by applying it to a demo application.  Why Embed One of the benefits of using Go is having your application compiled into a single self-contained binary. Having a

Best Practices Data Engineering Tutorials

Building A Data Development Environment with lakeFS

Barak Amar

Overview As part of our routine work with data we develop code, choose and upgrade compute infrastructure, and test new data. Usually, this requires running parts of our production pipelines in parallel to production, testing the changes we wish to apply. Every data engineer knows that this convoluted process requires copying data, manually updating configuration,

Tutorials

The lakeFS Katacoda Sandbox Environment – Interactive Data Versioning Learning

Guy Hardonag

If you’re interested in playing around and exploring lakeFS, you can now easily get started using the Katacoda demo which provides a personalized sandboxed environment – all from your browser, without installing anything.  lakeFS is an open source platform that delivers resilience and manageability to object-storage based data lakes. With lakeFS you can build repeatable,

Best Practices Tutorials

In-process Caching In Go: Scaling lakeFS to 100k Requests/Second

Barak Amar

This is a first in a series of posts describing our journey of scaling lakeFS. In this post we describe how adding an in-process cache to our Go server speed up our authorization flow. Background lakeFS is an open-source layer that delivers resilience and manageability to object-storage based data lakes. With lakeFS you can build

Best Practices Tutorials

From Zero to Versioned Data in Spark

Guy Hardonag

This tutorial aims to give you a fast start with lakeFS and use its git-like terminology in Spark. It covers the following: This simple flow gives a sneak peak to how seamless and easy it is to make changes to data using lakeFS. Once you get the value of a resilient data flow, you can

Git for Data – lakeFS

  • Get Started
    Get Started
  • Did you know that lakeFS is an official Databricks Technology Partner? Learn more about -

    lakeFS for Databricks
    +