Best Practices

Best Practices Data Engineering

Big Data Testing: How To Test Data Pipelines In The ETL World

The lakeFS team
January 23, 2023

When testing ETLs for big data applications, data engineers usually face a challenge that originates in the very nature of data lakes. Since we’re writing or streaming huge volumes of data to a central location, it only makes sense to carry out data testing against equally massive amounts of data. You need to test with …

Big Data Testing: How To Test Data Pipelines In The ETL World Read More »

Best Practices Data Engineering

CI/CD for data pipelines – The Shortest Path to Your Destination with lakeFS

The lakeFS team
February 7, 2023

Overview Continuous integration (CI) of data is the process of exposing data to consumers only after ensuring it adheres to best practices such as format, schema, and PII governance. Continuous deployment (CD) of data ensures the quality of data at each step of a production pipeline. These approaches are commonly used by application developers of …

CI/CD for data pipelines – The Shortest Path to Your Destination with lakeFS Read More »

Best Practices Data Engineering

Data Version Control – A Data Engineering Best Practice You Must Adopt

Einat Orr, PhD.
January 3, 2023

Imagine the software engineering world before distributed version control systems like Git became widespread. This is where the data world is currently at. The explosion in the volume of generated data forced organizations to move away from relational databases and instead store data in object storage. This escalated manageability challenges that teams need to address …

Data Version Control – A Data Engineering Best Practice You Must Adopt Read More »

Git for Data – lakeFS

  • Get Started
    Get Started
  • LIVE: Develop Spark pipelines against production data on February 15 -

    Register Now
    +