Guy Hardonag
August 18, 2020

This tutorial aims to give you a fast start with lakeFS and use it’s git-like terminology in spark. It covers the following:

  1. Quick start to install lakeFS using Docker Compose.
  2. How to create a repository, add files to it, create a branch and make changes to the repository using spark jobs.
  3. How to review changes before exposing them to consumers by merging to master.

This simple flow gives a sneak peak to how seamless and easy it is to make changes to data using lakeFS. Once you get the value of a resilient data flow, you can map it to many use cases within your data architecture from validating writes of raw data, to providing a safety net to your ETL pipelines or your ML (or other algorithmic logic) pipelines. You can pull the trigger, your master data lake is safe.

For more detailed information check out our documentation.


  • Get Started
    Get Started
  • Git for Data - What, How and Why Now?

    Read the article