Best Practices Data Engineering Machine Learning Thought Leadership

Scaling ML Data Without Breaking Compliance

Gottfried Sehringer

July 6, 2026

In highly regulated environments, improving developer experience often comes at the cost of tighter controls. For companies handling sensitive personal data, even small workflow changes can introduce compliance risks that are difficult to detect and even harder to fix at scale. The tension between usability and governance is especially visible in machine learning pipelines. Data […]

Best Practices Data Engineering Machine Learning Thought Leadership

From Glue-on-Pizza to Provenance: A Practical Guide to Reproducible AI

Oz Katz

June 29, 2026

The now-infamous “pizza with glue” AI result is a symptom of something deeper than one bizarre edge case. When AI systems fail, the root cause is rarely mysterious. More often than not, bad outputs can be traced back to bad inputs: flawed data, unclear lineage, or uncontrolled environments. Smarter models won’t fix this on their

Best Practices Data Engineering Machine Learning Thought Leadership

Why AI Sovereignty Is Becoming a Strategic Imperative

Iddo Avneri

June 22, 2026

AI raises a question most organizations haven’t answered yet: who really controls the foundation? In a recent presentation at the AI-Ready Data Summit, Matthew Miller, Sr. Principal Chief Architect, Field CTO Office at Red Hat, showed that AI sovereignty isn’t a policy debate but an infrastructure strategy. Every AI system depends on choices about data,

Best Practices Data Engineering Machine Learning

Data Agents: How to Build Reliable Enterprise AI Workflows on Trusted Data

Tal Sofer

June 17, 2026

Data agents are fast becoming the operating layer of enterprise AI – automating analysis, managing workflows, obtaining context, and acting across production systems. Headless agents are coming for your data, there’s no doubt about it. But while agent skills are improving at a breakneck pace, trust is still the biggest barrier to adoption. Denodo’s AI

Best Practices Data Engineering Machine Learning Product

Data Lake Mount for Efficient Data Sharing & Versioned Lake Management

Oz Katz

June 11, 2026

Mounting object storage as a filesystem is the fastest way to get a notebook or Spark job reading S3, Azure Data Lake Storage, or GCS without rewriting it around an SDK. It is also the fastest way to discover what object storage does not give you: no atomicity across files, thin audit trails, and configuration

Best Practices Data Engineering Machine Learning Product Thought Leadership

Agentic AI Will Make or Break on the Data Layer. Meet lakeFS for Agentic AI

Gottfried Sehringer

June 10, 2026

For the past few years, the hard work in AI has gone into models. Organizations spent that time learning, experimenting, and building the best models they could. That work paid off, and it cleared the way for what’s happening now, everywhere, at breakneck speed: agents. Companies have found real uses for agents across the organization,

Best Practices Machine Learning Thought Leadership

GxP-Aligned by Design: How lakeFS Brings Compliance Discipline to AI-Ready Data in Life Sciences

Vince Antinozzi

June 7, 2026

AI is moving fast in life sciences. GxP is not. The teams that close that gap first get treatments to market faster. Pharma, biotech, and medical device teams are racing to put AI to work. Drug discovery is being accelerated. Clinical trial analytics are being modernized. Quality control on the manufacturing line is being automated.

Data Engineering Machine Learning

Multimodal Data Integration: Architecture, Challenges & Best Practices

Idan Novogroder

May 28, 2026

As AI systems scale, data bottlenecks for AI projects quickly become one of the key barriers to model development and deployment. Slow pipelines, inconsistent datasets, and poor reproducibility show up as delayed testing, rising AI infrastructure costs, and lower model reliability. To support faster iteration and production-ready MLOps operations, teams need efficient data management at

Machine Learning Thought Leadership

Headless agents are coming for your data. Be ready with lakeFS.

Oz Katz

May 6, 2026

The lakeFS Control Plane for AI-ready Data provides agents that rely on large, multimodal datasets, isolated access, verifiable results and built-in governance. TL;DR A new kind of consumer for your data A few weeks ago at TrailblazerDX 2026, Salesforce put a name on something the rest of the industry had been circling for months: Headless

Data Engineering Machine Learning

Multimodal Data Integration: Architecture, Challenges & Best Practices

Tal Sofer

May 5, 2026

Unless you’ve been living under a rock, you’ve probably heard of multimodal data and its integration, now a standard feature of modern data platforms. As systems ingest data ranging from structured tables to unstructured text, graphics, and streams, the difficulty shifts from data collection to data integration. What differentiates experimental pipelines from production-grade systems is

Best Practices Machine Learning

Data Management for AI Projects: Strategies, Tools & Best Practices

Idan Novogroder

April 28, 2026

AI projects often end up failing due to data, not models. Inconsistent inputs, poor data quality, a lack of lineage, and fragmented workflows subtly weaken even the most sophisticated algorithms. As datasets grow in size and complexity, data management for AI projects evolves from a supporting role to a core engineering discipline. Without a solid

Best Practices Data Engineering Machine Learning

Center of Excellence for Enterprise AI: Models & Best Practices

Einat Orr, PhD

April 20, 2026

Scaling AI isn’t about building better models; it’s about building the system around them. Without consistency in data, workflows and governance, teams hit the same walls: A Center of Excellence (CoE) for Enterprise AI solves this by standardizing how AI is built, validated, and deployed – so teams can move faster without losing control. But

Machine Learning

Pick up the Slack with lakeFS