Develop ETL pipelines with zero copy prod data on top of AWS EMR Serverless

Develop ETL pipelines with zero copy prod data on top of AWS EMR Serverless

Description

Delivering high-quality data products requires strict testing of pipelines before deploying those into production. Today, to test using quality data, one either needs to use a subset of the production data, or is forced to create multiple copies of the entire data. Testing against sample data is not good enough. The alternative, however, is costly and time-consuming. We will demonstrate how to get the entire production data set with zero-copy.

You will learn:

  • Create multiple isolated testing environments without copying data
  • Automate the process of testing your logic, using local Airflow installation against lakeFS on AWS and S3.

LakeFS

  • Get Started
    Get Started
  • Git for Data - What, How and Why Now?

    Read the article
    +