Guy Hardonag

Integrations

Building Reproducible Data Pipelines with Airflow and lakeFS

Guy Hardonag
February 3, 2021

In this post, we’ll see how easy it is to use lakeFS with an existing Airflow DAG, to make every step in a pipeline completely reproducible in both code and data. This is done without modifying the actual code and logic of our jobs – by wrapping these operations with lakeFS commits. An example data …

Building Reproducible Data Pipelines with Airflow and lakeFS Read More »

Project

The lakeFS Playground – Interactive Data Versioning Learning

Guy Hardonag
March 8, 2021

If you’re interested in playing around and exploring lakeFS, you can now easily get started using the Katacoda playground which provides a personalized sandboxed environment – all from your browser, without installing anything.  lakeFS is an open source platform that delivers resilience and manageability to object-storage based data lakes. With lakeFS you can build repeatable, …

The lakeFS Playground – Interactive Data Versioning Learning Read More »

Data Engineering Project

The Quick Guide for Running Presto Locally on S3

Guy Hardonag
March 8, 2021

This post aims to cover our experience running Presto in a local environment with the ability to query Amazon S3 and other S3 Compatible Systems. We will: Describe the components needed and how to configure them. Provide a dockerized environment you could run. Show an example of running the provided environment and querying a publicly …

The Quick Guide for Running Presto Locally on S3 Read More »

LakeFS

  • Get Started
    Get Started