Guy Hardonag
October 20, 2020

If you’re interested in playing around and exploring lakeFS, you can now easily get started using the Katacoda playground which provides a personalized sandboxed environment – all from your browser, without installing anything. 

lakeFS is an open source platform that delivers resilience and manageability to object-storage based data lakes. With lakeFS you can build repeatable, atomic and versioned data lake operations – from complex ETL jobs to data science and analytics. lakeFS supports AWS S3, Azure Blob Storage and Google Cloud Storage as its underlying storage service. It is API compatible with S3 and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc.

What you will learn:

This tutorial we will work with a sample dataset to give you a sense of the ways lakeFS makes it easy to work with data.

We will also cover:

  • Basic lakectl command line usage
  • How to read, write, list and delete objects from lakeFS using the lakectl command
  • Read from, and write to lakeFS using its S3 API interface using Spark
  • Diff, commit and merge the changes created by Spark
  • Track commit history to understand changes to your data over time

The web based environment provides a full working lakeFS and Spark environment, so feel free to explore it on your own.

Start playing on Katacoda!

More information about lakeFS

For more information about lakeFS or lakectl go to the lakeFS docs. If you have questions along the way, don’t hesitate to ask on the Slack Channel.

Additional resources:

LakeFS

  • Get Started
    Get Started