Security in lakeFS: Understanding Role-Based Access Control (RBAC)
A discussion of how Role-Based Access Control works in lakeFS with code examples for three different types of user.
A discussion of how Role-Based Access Control works in lakeFS with code examples for three different types of user.
What’s not to love about data pipeline testing? Adding acceptance tests to your data pipelines makes them less likely to make mistakes and makes sure that enough quality checks are done on the data before it is sent to end users. Testing data pipelines involves two components of any data pipeline: data and code used …
Acceptance Testing For Data Pipelines: Expert Guide Read More »
No time for the full article now? Read the abbreviated version here Introduction Often, data lake platforms lack simple ways to enforce data governance. This is especially challenging since data governance requirements are complicated to begin with, even without the added complexities of managing data in a data lake. Therefore, enforcing them is an expensive, …
Apache airflow enables you to build multistep workflows across multiple technologies. The programmatic approach, allowing you to schedule and monitor workflows, helps users build complicated ETLs on their data that will be difficult to achieve automatically otherwise.This enabled the evolution of ETLs from simple single steps to complicated, parallelized, multi steps advance transformations: The challenge …
Troubleshoot and Reproduce Data with Apache Airflow Read More »
Overview Our routine work with data includes developing code, choosing and upgrading compute infrastructure, and testing new and changed data pipelines. Usually, this requires running our tested pipelines in parallel to production, in order to test the changes we wish to apply. Every data engineer knows that this convoluted process requires copying data, manually updating …
How to Build an Isolated Testing Environment for Data with lakeFS Read More »
You’re bound to ask yourself this question at some point: Do I need to test the Spark ETLs I’m developing? The answer is yes; you certainly should – and not just with unit testing but also integration, performance, load, and regression testing. Naturally, the scale and complexity of your data set matters a lot, so …