If you’re interested in playing around and exploring lakeFS, you can now easily get started using the Katacoda demo which provides a personalized sandboxed environment – all from your browser, without installing anything.
lakeFS is an open source platform that delivers resilience and manageability to object-storage based data lakes. With lakeFS you can build repeatable, atomic and versioned data lake operations – from complex ETL jobs to data science and analytics. lakeFS supports AWS S3, Azure Blob Storage and Google Cloud Storage as its underlying storage service. It is API compatible with S3 and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc.
What you will learn:
This tutorial we will work with a sample dataset to give you a sense of the ways lakeFS makes it easy to work with data.
We will also cover:
- Basic lakectl command line usage
- How to read, write, list and delete objects from lakeFS using the lakectl command
- Read from, and write to lakeFS using its S3 API interface using Spark
- Diff, commit and merge the changes created by Spark
- Track commit history to understand changes to your data over time
The web based environment provides a full working lakeFS and Spark environment, so feel free to explore it on your own.
Start playing on Katacoda!
More information about lakeFS
For more information about lakeFS or lakectl go to the lakeFS docs. If you have questions along the way, don’t hesitate to ask on the Slack Channel.
Additional resources: