Best Practices Data Engineering Machine Learning Thought Leadership

From Glue-on-Pizza to Provenance: A Practical Guide to Reproducible AI

Oz Katz

June 29, 2026

The now-infamous “pizza with glue” AI result is a symptom of something deeper than one bizarre edge case. When AI systems fail, the root cause is rarely mysterious. More often than not, bad outputs can be traced back to bad inputs: flawed data, unclear lineage, or uncontrolled environments. Smarter models won’t fix this on their […]

Best Practices Data Engineering Machine Learning Thought Leadership

Why AI Sovereignty Is Becoming a Strategic Imperative

Iddo Avneri

June 22, 2026

AI raises a question most organizations haven’t answered yet: who really controls the foundation? In a recent presentation at the AI-Ready Data Summit, Matthew Miller, Sr. Principal Chief Architect, Field CTO Office at Red Hat, showed that AI sovereignty isn’t a policy debate but an infrastructure strategy. Every AI system depends on choices about data,

Data Engineering Product Thought Leadership

Unity Catalog and the Quiet Return of Vendor Lock-In

Oz Katz

June 18, 2026

Databricks built its reputation on openness. Spark. Delta Lake. MLflow. A company that rose by betting on open ecosystems over proprietary silos. Which is why Unity Catalog feels like such a sharp turn. And just now, the Pattern Got Harder to Ignore. This Week at Databricks’ Data + AI Summit, the Pattern Got Harder to

Best Practices Data Engineering Machine Learning

Data Agents: How to Build Reliable Enterprise AI Workflows on Trusted Data

Tal Sofer

June 17, 2026

Data agents are fast becoming the operating layer of enterprise AI – automating analysis, managing workflows, obtaining context, and acting across production systems. Headless agents are coming for your data, there’s no doubt about it. But while agent skills are improving at a breakneck pace, trust is still the biggest barrier to adoption. Denodo’s AI

Best Practices Data Engineering Machine Learning Product

Data Lake Mount for Efficient Data Sharing & Versioned Lake Management

Oz Katz

June 11, 2026

Mounting object storage as a filesystem is the fastest way to get a notebook or Spark job reading S3, Azure Data Lake Storage, or GCS without rewriting it around an SDK. It is also the fastest way to discover what object storage does not give you: no atomicity across files, thin audit trails, and configuration

Best Practices Data Engineering Machine Learning Product Thought Leadership

Agentic AI Will Make or Break on the Data Layer. Meet lakeFS for Agentic AI

Gottfried Sehringer

June 10, 2026

For the past few years, the hard work in AI has gone into models. Organizations spent that time learning, experimenting, and building the best models they could. That work paid off, and it cleared the way for what’s happening now, everywhere, at breakneck speed: agents. Companies have found real uses for agents across the organization,

Data Engineering Machine Learning

Multimodal Data Integration: Architecture, Challenges & Best Practices

Idan Novogroder

May 28, 2026

As AI systems scale, data bottlenecks for AI projects quickly become one of the key barriers to model development and deployment. Slow pipelines, inconsistent datasets, and poor reproducibility show up as delayed testing, rising AI infrastructure costs, and lower model reliability. To support faster iteration and production-ready MLOps operations, teams need efficient data management at

Data Engineering Machine Learning

Multimodal Data Integration: Architecture, Challenges & Best Practices

Tal Sofer

May 5, 2026

Unless you’ve been living under a rock, you’ve probably heard of multimodal data and its integration, now a standard feature of modern data platforms. As systems ingest data ranging from structured tables to unstructured text, graphics, and streams, the difficulty shifts from data collection to data integration. What differentiates experimental pipelines from production-grade systems is

Best Practices Data Engineering Machine Learning

Center of Excellence for Enterprise AI: Models & Best Practices

Einat Orr, PhD

April 20, 2026

Scaling AI isn’t about building better models; it’s about building the system around them. Without consistency in data, workflows and governance, teams hit the same walls: A Center of Excellence (CoE) for Enterprise AI solves this by standardizing how AI is built, validated, and deployed – so teams can move faster without losing control. But

Best Practices Data Engineering Machine Learning

AI Center of Excellence: How to Build Reliable & Reproducible AI Systems

Einat Orr, PhD

March 19, 2026

As AI adoption evolves and teams advance from scattered ML trial projects to running AI as a production system, they inevitably face the question of how to operate such a system reliably at scale. This is where an AI Center of Excellence (AI CoE) comes in. It’s an organizational and technical response to that shift:

Best Practices Data Engineering Machine Learning Tutorials

Building Compliant and Reproducible ML Pipelines

Itai Gilo

February 3, 2026

Based on my presentation at PyData Global 2025 When we – engineers – hear the word “compliance,” we tend to roll our eyes. We want to build features, not fill out forms. But here’s good news: the exact same tools that help you debug your code can also keep you out of trouble. In this

Best Practices Data Engineering Machine Learning

lakeFS Top 10 Defining Product Milestones in 2025

Oz Katz

January 14, 2026

2025 was a defining year for lakeFS. Across open source and Enterprise editions, we shipped major capabilities that expanded lakeFS from a powerful data versioning layer into a control plane for AI-Ready Data – spanning structured and unstructured data, multiple public and private clouds, and a growing ecosystem of analytics and ML engines. Here’s our

Data Engineering

Pick up the Slack with lakeFS