Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Learn from AI, ML & data leaders

March 31, 2026  |  Live

Best Practices

Best Practices Tutorials

How to Migrate or Clone a lakeFS Repository: Step-by-Step Tutorial

Amit Kesarwani

Introduction If you want to migrate or clone repositories from a source lakeFS environment to a target lakeFS environment then follow this tutorial. Your source and target lakeFS environments can be running locally or in the cloud. You can also follow this tutorial if you want to migrate/clone a source repository to a target repository […]

Best Practices Data Engineering

15 Data Engineering Best Practices to Follow in 2026

Einat Orr, PhD

Key Takeaways The software engineering world has profoundly transformed in the past decades. This was possible thanks to the emergence of methodologies and tools that helped establish and apply new successful engineering best practices.  The leading example is the move from a waterfall software development process into the concept of DevOps: At each moment, there

Best Practices Tutorials

Version Control Data Pipelines Using the Medallion Architecture

Iddo Avneri

A step by step guide to running pipelines on Bronze, Silver and Gold layers with lakeFS Introduction The Medallion Architecture is a software design pattern that organizes a data pipeline into three distinct tiers based on functionality: bronze, silver, and gold. The bronze tier represents the core functionality of the system, while the silver and

Best Practices Data Engineering

Applying Engineering Best Practices to Data Lakes

Einat Orr, PhD

In the last 30 years, agile development methodology played a significant part in the digital transformation the world is undergoing. What stands as the basis of the methodology is the ability to iterate fast on product features, using the shortest possible feedback loop from ideation to user feedback. This short feedback loop allows us to

Best Practices Machine Learning Tutorials

Building an ML Experimentation Platform for Easy Reproducibility Using lakeFS

Vino SD

MLOps is mostly data engineering. As organizations ride past the hype cycle of MLOps, we realize there is significant overlap between MLOps and data engineering. As ML engineers, we spend most of our time collecting, verifying, pre-processing, and engineering features from data before we can even begin training models.  Only 5% of developing and deploying

Best Practices

How To Maintain Data Quality In Your Data Lake

The lakeFS Team

Enterprises use more and more data as the foundation for their decisions and operations. The sheer number of digital goods that collect, analyze, and use data to feed decision-making algorithms in order to improve future services is also rapidly increasing. Because of this, data quality has become the most important asset for businesses in almost

Best Practices Data Engineering

Big Data Testing: Benefits, Challenges & Tools

The lakeFS Team

When testing ETLs for big data applications, data engineers usually face a challenge that originates in the very nature of data lakes. Since we’re writing or streaming huge volumes of data to a central location, it only makes sense to carry out data testing against equally massive amounts of data. You need to test with

Best Practices

Best Practices to Easily Adopt lakeFS

Iddo Avneri

lakeFS is gaining momentum as a solution for data versioning on top of an object store, and more and more data driven organizations adopt lakeFS as their data version control. Once you start using lakeFS, the files on your object store will form in a new structure. Other solutions, such as Iceberg, also create a

Best Practices Data Engineering

Write-Audit-Publish for Data Pipelines: The Shortest Path to Your Destination with lakeFS

The lakeFS Team

Overview Continuous integration (CI) of data is the process of exposing data to consumers only after ensuring it adheres to best practices such as format, schema, and PII governance. Continuous deployment (CD) of data ensures the quality of data at each step of a production pipeline. These approaches are commonly used by application developers of

Best Practices Data Engineering

Data Version Control – A Data Engineering Best Practice You Must Adopt

Einat Orr, PhD

Imagine the software engineering world before distributed version control systems like Git became widespread. This is where the data world is currently at. The explosion in the volume of generated data forced organizations to move away from relational databases and instead store data in object storage. This escalated manageability challenges that teams need to address

Best Practices Data Engineering

Git for Data – What, How and Why Now?

Einat Orr, PhD

Git, the Source Control, aka Code Version Control When we wish for “Git for Data”, we already know what code version control is, and that Git is the standard tool for code version control. For the sake of those who have just joined us, let’s define those terms. Back in the 60’s of the 20th

We use cookies to improve your experience and understand how our site is used.

Learn more in our Privacy Policy