lakeFS Blog

Best Practices Data Engineering

Data Version Control – A Data Engineering Best Practice You Must Adopt

Einat Orr, PhD.
December 1, 2022

Imagine the software engineering world before distributed version control systems like Git became widespread. This is where the data world is currently at. The explosion in the volume of generated data forced organizations to move away from relational databases and instead store data in object storage. This escalated manageability challenges that teams need to address …

Data Version Control – A Data Engineering Best Practice You Must Adopt Read More »

Data Engineering

Data Reproducibility and other Data Lake Best Practices

The lakeFS team
November 20, 2022

Overview Data changes frequently, making the task of keeping track of its exact state over time difficult. Oftentimes, people maintain only one state of their data––its current state. Data lake best practices require reproducibility that lets us time travel between different versions of the data, enabling us a snapshot at the data at different times …

Data Reproducibility and other Data Lake Best Practices Read More »

Community

lakeFS Community: Leonard Aukea nominated for Machine Learning Professional of the year!

Adi Polak
November 20, 2022

Our community is full of people with incredible skills and know-how. And this nomination proves us right! Our community member @Leonard Aukea has been nominated for Machine Learning Professional of the year as part of the Nordic DAIR Awards. Congratulations, Leonard!  Who is Leonard? Leonard Aukea has been Heading Machine Learning Engineering and Operations at …

lakeFS Community: Leonard Aukea nominated for Machine Learning Professional of the year! Read More »

Data Engineering Thought Leadership

4 Ways to Reduce Cloud Data Storage Costs

Oz Katz
November 7, 2022

In the past year, words like recession, business slowdown and monetary cuttings are being heard more and more often. Not just in the economic press and in the media, these discussions are very much heard also in almost all companies – within boardrooms, in management meetings and when engaging with potential investors and customers. As …

4 Ways to Reduce Cloud Data Storage Costs Read More »

Tutorials Use Cases

How to Build an Isolated Testing Environment for Data with lakeFS

Barak Amar
November 7, 2022

Overview Our routine work with data includes developing code, choosing and upgrading compute infrastructure, and testing new and changed data pipelines. Usually, this requires running our tested pipelines in parallel to production, in order to test the changes we wish to apply. Every data engineer knows that this convoluted process requires copying data, manually updating …

How to Build an Isolated Testing Environment for Data with lakeFS Read More »

Data Engineering Use Cases

How to Develop Spark ETL Pipelines in Isolation

Amit Kesarwani, Vino SD, Iddo Avneri
November 7, 2022

You’re bound to ask yourself this question at some point: Do I need to test the Spark ETLs I’m developing? The answer is yes; you certainly should – and not just with unit testing but also integration, performance, load, and regression testing. Naturally, the scale and complexity  of your data set matters a lot, so …

How to Develop Spark ETL Pipelines in Isolation Read More »

Data Engineering Go

lakeFS with DynamoDB – How Key Value Store is Used by lakeFS

Itai David
October 26, 2022

This blog discusses advanced topics within lakeFS. If you are new to lakeFS, or would like to expand your knowledge of how lakeFS works, make sure to check out our documents section. In the Beginning There Was Postgres Up until recently, lakeFS was using a strongly consistent SQL DB, namely PostgreSQL, where all metadata was …

lakeFS with DynamoDB – How Key Value Store is Used by lakeFS Read More »

Case Studies Data Engineering

How Epcor Built CI/CD for Data Pipelines

Stephen Seewald, Raghvendra Verma, Cory Matheson
September 14, 2022

It is no secret that modern businesses run on big data. If your business was a car, big data would be the engine that powers it. All businesses want to leverage their data to the hilt to make better-informed decisions that accelerate their success. But with the volume, velocity, and variety of data growing exponentially, …

How Epcor Built CI/CD for Data Pipelines Read More »

LakeFS

  • Get Started
    Get Started
  • Git for Data - What, How and Why Now?

    Read the article
    +