Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Data Engineering

Best Practices Data Engineering Machine Learning

lakeFS Top 10 Defining Product Milestones in 2025

Oz Katz

2025 was a defining year for lakeFS. Across open source and Enterprise editions, we shipped major capabilities that expanded lakeFS from a powerful data versioning layer into a control plane for AI-Ready Data – spanning structured and unstructured data, multiple public and private clouds, and a growing ecosystem of analytics and ML engines. Here’s our […]

Best Practices Data Engineering Machine Learning

Building a Data Center of Excellence for Modern Data Teams

Einat Orr, PhD

Sooner or later, every data team will reach a point where things stop working – whether it’s due to team growth, changing business requirements, or advancing pipeline complexity. When facing these issues, leaders start considering a different approach that perfectly balances centralized and decentralized organizational models. A Data Center of Excellence (DCoE) is a centralized

Data Engineering Machine Learning

What is Iceberg Versioning and How It Improves Data Reliability

Itai Gilo

Apache Iceberg includes built-in table versioning to ensure that all changes to your data are logged, consistent, and recoverable. Instead of overwriting files or relying on task time, Iceberg saves each update as an immutable snapshot, ensuring that readers always see a consistent picture of the table, even during heavy writes.  This boosts reliability by

Data Engineering Machine Learning

Data Agility: Building Faster, Smarter, Scalable Workflows

Idan Novogroder

It pays for organizations to treat their data like a product that works like a driving force behind innovation, efficiency, and competitiveness. While data quality is an important aspect, let’s not forget that companies operate in a rapidly changing environment – and their data needs to reflect this by quickly adapting. This is where data

Best Practices Data Engineering Machine Learning

How lakeFS Transactional Mirroring Keeps Your Data Available During Cloud Outages

Idan Novogroder

When AWS Goes Down, Your Data Shouldn’t On October 20th, 2025, AWS experienced a significant outage centered in the us-east-1 region. What started as a DNS resolution issue affecting DynamoDB quickly cascaded into widespread failures across major services and applications. From gaming platforms like Fortnite and social apps like Snapchat to enterprise systems and IoT

Data Engineering Machine Learning

Heterogeneous Data: Use Cases, Tools & Best Practices

Idan Novogroder

Organizations looking to unlock the value from their data are bound to encounter the challenge of dealing with diverse datasets. This includes data in various formats, sources, structures, and semantics, such as structured databases and spreadsheets, as well as unstructured text, photos, and sensor outputs.  Digital ecosystems will only become more complex, so the ability

Data Engineering Machine Learning

Distributed Data Management: Key Concepts, Tools & Best Practices

Idan Novogroder

Ask any data team, and you’ll quickly learn that nobody out there manages all the organization’s data in a single centralized location. Most teams operate across various clouds, locations, and platforms, facing increasingly fragmented, replicated, and decentralized data. This makes effective distributed data management an essential capability.  Keep reading this article to explore the fundamental

Best Practices Data Engineering Machine Learning

Bound by Physics: Why Data Version Control is Critical for Real-World AI

Vince Antinozzi, Yoav Yetinson

TL;DR Software-only systems can be rerun from the source, but physics-bound workflows face a tougher challenge. Once a moment is gone, it’s gone. Sensor drift, hardware changes, and environmental uniqueness make it impossible to recreate the exact conditions. For audits, safety, and machine learning, you need full data provenance, including raw data, derived outputs, and

Best Practices Data Engineering Machine Learning

Versioning Data Labels: Integrating Labeling Tools with lakeFS

Iddo Avneri

In this post, we explore how lakeFS can integrate with popular data labeling solutions, the differences between labeling tools’ built-in dataset management and lakeFS data version control, and why combining them is invaluable. We’ll also highlight use cases – from autonomous vehicles to healthcare – where rigorous data versioning alongside labeling is essential. Overview of

Data Engineering Machine Learning

Unified Data Management: Types, Challenges & Best Practices

Idan Novogroder

Historically, companies have developed their IT systems on an ad hoc basis, installing various software and taking on data management approaches as their needs changed. The resulting organization is diverse, with multiple tools and data that serve the same function. Data tends to be segregated and dispersed across teams and areas, with little to no

Data Engineering Machine Learning

What is Metadata Filtering? Benefits, Best Practices & Tools

Idan Novogroder

Vector databases are a critical enabler for expanding the use of LLMs. They power applications such as Retrieval Augmented Generation (RAG), pattern matching, anomaly detection, and recommendation systems by retrieving relevant data for your application.  A vector database needs to carry out efficient similarity searches across vector embeddings of both unstructured and structured data. You

Data Engineering Machine Learning Product

How lakeFS Helps Ensure Data Compliance

Tal Sofer

Data compliance is all about adhering to laws, regulations, standards, and internal policies regarding data use. Organizations must comply with regulations like the General Data Protection Regulation (GDPR), the Health Insurance Portability and Accountability Act (HIPAA), the California Consumer Privacy Act (CCPA) and SOC2 standards to protect sensitive information and maintain trust. Data compliance plays

lakeFS