Iceberg REST Catalog Alternatives: Top Options & How to Choose The Best One For Your Team

Itai Gilo

January 19, 2026

Preparing data for AI projects is about more than fast storage or shiny new table formats – it all starts with selecting the right data catalog to anchor your entire ecosystem. The catalog you pick specifies how your tables are discovered, versioned, secured, and evolved, which, in turn, impacts the reliability and clarity of every […]

Best Practices Data Engineering Machine Learning

lakeFS Top 10 Defining Product Milestones in 2025

Oz Katz

January 14, 2026

2025 was a defining year for lakeFS. Across open source and Enterprise editions, we shipped major capabilities that expanded lakeFS from a powerful data versioning layer into a control plane for AI-Ready Data – spanning structured and unstructured data, multiple public and private clouds, and a growing ecosystem of analytics and ML engines. Here’s our

Best Practices Data Engineering Machine Learning

Building a Data Center of Excellence for Modern Data Teams

Einat Orr, PhD

January 4, 2026

Sooner or later, every data team will reach a point where things stop working – whether it’s due to team growth, changing business requirements, or advancing pipeline complexity. When facing these issues, leaders start considering a different approach that perfectly balances centralized and decentralized organizational models. A Data Center of Excellence (DCoE) is a centralized

Best Practices Machine Learning

Iceberg Tables Management: Processes, Challenges & Best Practices

Itai Gilo

December 17, 2025

We all love data lakes. They’re just perfect for storing massive volumes of structured, semi-structured, and unstructured data in native file formats. And they let us explore, refine, and analyze petabytes of data constantly pouring in from various sources. But there’s a caveat. The individual files in a data lake lack the necessary information for

Machine Learning Product

Metadata Quality: Types, Processes, and Best Practices

Tal Sofer

December 9, 2025

Data practitioners rarely need convincing to prioritize data quality – it’s already a top-of-mind concern for most. If data is messy, incomplete, or outdated, teams are basically flying blind. Gartner estimates that poor data quality costs businesses around $12.9 million a year. But ask anyone about metadata quality and you might get a blank stare.

Machine Learning Product

What is Multimodal Data? Benefits, Challenges & Best Practices

Tal Sofer

December 2, 2025

Multimodal data is nothing else than data gathered from several sources or formats, such as text, photos, audio, video, and sensor readings. What value does it bring to teams? It collectively provides a more complete, holistic view of the environment. This is especially relevant to AI systems learning to understand and interact with humans more

Machine Learning Product

Introducing Metadata Search in lakeFS

Tal Sofer

November 25, 2025

Making Sense of Large-Scale Data Through Metadata Picture this: Your ML team needs to find all images labeled “defective” from Q3 production runs tagged by a specific annotation workflow to retrain a quality control model. In a data lake with 10 billion objects, how do you find them? For most teams, the answer is: you

Data Engineering Machine Learning

What is Iceberg Versioning and How It Improves Data Reliability

Itai Gilo

November 10, 2025

Apache Iceberg includes built-in table versioning to ensure that all changes to your data are logged, consistent, and recoverable. Instead of overwriting files or relying on task time, Iceberg saves each update as an immutable snapshot, ensuring that readers always see a consistent picture of the table, even during heavy writes. This boosts reliability by

Data Engineering Machine Learning

Data Agility: Building Faster, Smarter, Scalable Workflows

Idan Novogroder

November 3, 2025

It pays for organizations to treat their data like a product that works like a driving force behind innovation, efficiency, and competitiveness. While data quality is an important aspect, let’s not forget that companies operate in a rapidly changing environment – and their data needs to reflect this by quickly adapting. This is where data

Best Practices Data Engineering Machine Learning

How lakeFS Transactional Mirroring Keeps Your Data Available During Cloud Outages

Idan Novogroder

October 23, 2025

When AWS Goes Down, Your Data Shouldn’t On October 20th, 2025, AWS experienced a significant outage centered in the us-east-1 region. What started as a DNS resolution issue affecting DynamoDB quickly cascaded into widespread failures across major services and applications. From gaming platforms like Fortnite and social apps like Snapchat to enterprise systems and IoT

Data Engineering Machine Learning

Heterogeneous Data: Use Cases, Tools & Best Practices

Idan Novogroder

October 20, 2025

Organizations looking to unlock the value from their data are bound to encounter the challenge of dealing with diverse datasets. This includes data in various formats, sources, structures, and semantics, such as structured databases and spreadsheets, as well as unstructured text, photos, and sensor outputs. Digital ecosystems will only become more complex, so the ability

Data Engineering Machine Learning

Distributed Data Management: Key Concepts, Tools & Best Practices

Idan Novogroder

October 19, 2025

Ask any data team, and you’ll quickly learn that nobody out there manages all the organization’s data in a single centralized location. Most teams operate across various clouds, locations, and platforms, facing increasingly fragmented, replicated, and decentralized data. This makes effective distributed data management an essential capability. Keep reading this article to explore the fundamental

Machine Learning

Pick up the Slack with lakeFS