Webinar
Test an ETL pipeline on AWS EMR against production data without copying anything
Delivering high-quality data products requires strict testing of pipelines before deploying those into production. Today, to test using quality data, one either needs to use a subset of the production data, or is forced to create multiple copies of the entire data. Testing against sample data is not good enough. The alternative, however, is costly …
Test an ETL pipeline on AWS EMR against production data without copying anything Read More »
Promote only high-quality data to production
Engineering best practices dictate having an isolated staging environment. And yet today, data transformation is done most often directly on production data.Moreover, even if the code and infrastructure doesn’t change, the data might, and those changes introduce potential quality issues. In this webinar, you will learn: How to create a staging environment for your data …
Develop Spark ETL pipelines with no risk against production data
Delivering high-quality data products requires strict testing of pipelines before deploying those into production. Today, to test using quality data, one either needs to use a subset of the data, or is forced to create multiple copies of the entire data. Testing against sample data is not good enough. The alternative, however, is costly and …
Develop Spark ETL pipelines with no risk against production data Read More »
Achieve Multi-Table Transactions On Delta tables
Data engineers typically need to implement custom logic in scripts to guarantee two or more data assets (tables) are updated synchronously. This logic often requires extensive rewrites or periods during which data is unavailable or not synchronized. We will demonstrate a way to run data transformation in isolation across multiple tables, without ever creating a …
Achieve Multi-Table Transactions On Delta tables Read More »