Ready to dive into the lake?

lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Data Engineering

Data Engineering Machine Learning Tutorials

Backfilling Data: A Foolproof Guide to Managing Historical Data

Iddo Avneri

If you work with a smaller dataset or do one-off jobs, the way you manage backfills isn’t that crucial. But what if you face constantly growing datasets with billions to trillions of records? Your backfilling data strategy will have a much bigger impact. When dealing with modern data pipelines on such a scale, it’s key …

Backfilling Data: A Foolproof Guide to Managing Historical Data Read More »

Best Practices Data Engineering

Applying Engineering Best Practices to Data Lakes

Einat Orr, PhD.

In the last 30 years, agile development methodology played a significant part in the digital transformation the world is undergoing. What stands as the basis of the methodology is the ability to iterate fast on product features, using the shortest possible feedback loop from ideation to user feedback. This short feedback loop allows us to …

Applying Engineering Best Practices to Data Lakes Read More »

Data Engineering

Managing Structured and Unstructured Data – a Guide for an Effective Synergy

Michal Wosk

No time for the full article now? Read the abbreviated version here Many organizations and companies are rapidly moving from managing only structured data sets to managing both  structured and unstructured data. This is due to the growth in the number of sources and data types, which are rooted in the new variety of use …

Managing Structured and Unstructured Data – a Guide for an Effective Synergy Read More »

Best Practices Data Engineering

Big Data Testing: How To Test Data Pipelines In The ETL World

The lakeFS team

When testing ETLs for big data applications, data engineers usually face a challenge that originates in the very nature of data lakes. Since we’re writing or streaming huge volumes of data to a central location, it only makes sense to carry out data testing against equally massive amounts of data. You need to test with …

Big Data Testing: How To Test Data Pipelines In The ETL World Read More »

Data Engineering

ETL Testing: A Practical Guide

Iddo Avneri

What is ETL Testing? In a rush? Read the condensed version of this article here ETL testing is the process of evaluating and verifying that the ETL (Extract, Transform, Load) processes work correctly.  What is ETL? An ETL process Extracts data of potentially many different structure or unstructured formats from multiple sources into a centralized …

ETL Testing: A Practical Guide Read More »

Git for Data – lakeFS

  • Get Started
    Get Started
  • The annual State of Data Engineering Report is now available. Find out what’s new in 2023 -

    +