Best Practices Data Engineering Machine Learning Tutorials

Beyond RAG: Put Open Knowledge Format Bundles Into Production with lakeFS

Oz Katz

July 13, 2026

Traditional RAG often feels like a black box. Vector search is powerful, but hard to reason about, and that makes AI agent knowledge versioning nearly impossible: you can’t track which version of your data an agent was served, who changed it, or why. Emerging standards such as Open Knowledge Format (OKF) aim to change that. […]

Best Practices Data Engineering Machine Learning Thought Leadership

Agentic Data Access: How AI Agents Securely Access Enterprise Data

Oz Katz

July 9, 2026

When agents become the primary consumers of data, organizations need a secure, reproducible, and governed way to manage how those agents reach it. This article covers how AI agents access enterprise data in practice: the four access models, the core components behind them, the risks that show up in production, and why reproducibility decides whether

Best Practices Data Engineering Machine Learning Tutorials

Give Your AI agent a Versioned Filesystem: A Self-Correcting Receipts Pipeline on E2B and lakeFS

Alexandria Yip, Iddo Avneri

July 8, 2026

In this post we build something most agent demos skip: an agent that does real work on real data, inside guardrails it can’t escape. The agent turns a messy folder of receipts and invoices into a clean, validated ledger, and it does it on a lakeFS branch mounted as an ordinary filesystem inside an E2B

Best Practices Data Engineering Machine Learning Thought Leadership

Scaling ML Data Without Breaking Compliance

Gottfried Sehringer

July 6, 2026

In highly regulated environments, improving developer experience often comes at the cost of tighter controls. For companies handling sensitive personal data, even small workflow changes can introduce compliance risks that are difficult to detect and even harder to fix at scale. The tension between usability and governance is especially visible in machine learning pipelines. Data

Best Practices Data Engineering Machine Learning Thought Leadership

From Glue-on-Pizza to Provenance: A Practical Guide to Reproducible AI

Oz Katz

June 29, 2026

The now-infamous “pizza with glue” AI result is a symptom of something deeper than one bizarre edge case. When AI systems fail, the root cause is rarely mysterious. More often than not, bad outputs can be traced back to bad inputs: flawed data, unclear lineage, or uncontrolled environments. Smarter models won’t fix this on their

Best Practices Data Engineering Machine Learning Thought Leadership

Why AI Sovereignty Is Becoming a Strategic Imperative

Iddo Avneri

June 22, 2026

AI raises a question most organizations haven’t answered yet: who really controls the foundation? In a recent presentation at the AI-Ready Data Summit, Matthew Miller, Sr. Principal Chief Architect, Field CTO Office at Red Hat, showed that AI sovereignty isn’t a policy debate but an infrastructure strategy. Every AI system depends on choices about data,

Best Practices Data Engineering Machine Learning

Data Agents: How to Build Reliable Enterprise AI Workflows on Trusted Data

Tal Sofer

June 17, 2026

Data agents are fast becoming the operating layer of enterprise AI – automating analysis, managing workflows, obtaining context, and acting across production systems. Headless agents are coming for your data, there’s no doubt about it. But while agent skills are improving at a breakneck pace, trust is still the biggest barrier to adoption. Denodo’s AI

Best Practices Thought Leadership

Driving End-User Adoption of AI-Ready Data Infrastructure

Joe Pringle

June 15, 2026

First presented at the AI-Ready Data Summit, this talk tackled the part of AI-ready data that tooling alone can’t solve: getting busy people to actually adopt it. AI-ready data is often framed as a technology challenge, but that framing misses the point. The real barrier often isn’t the tooling; it’s whether ML practitioners actually change

Best Practices Data Engineering Machine Learning Product

Data Lake Mount for Efficient Data Sharing & Versioned Lake Management

Oz Katz

June 11, 2026

Mounting object storage as a filesystem is the fastest way to get a notebook or Spark job reading S3, Azure Data Lake Storage, or GCS without rewriting it around an SDK. It is also the fastest way to discover what object storage does not give you: no atomicity across files, thin audit trails, and configuration

Best Practices Data Engineering Machine Learning Product Thought Leadership

Agentic AI Will Make or Break on the Data Layer. Meet lakeFS for Agentic AI

Gottfried Sehringer

June 10, 2026

For the past few years, the hard work in AI has gone into models. Organizations spent that time learning, experimenting, and building the best models they could. That work paid off, and it cleared the way for what’s happening now, everywhere, at breakneck speed: agents. Companies have found real uses for agents across the organization,

Best Practices Machine Learning Thought Leadership

GxP-Aligned by Design: How lakeFS Brings Compliance Discipline to AI-Ready Data in Life Sciences

Vince Antinozzi

June 7, 2026

AI is moving fast in life sciences. GxP is not. The teams that close that gap first get treatments to market faster. Pharma, biotech, and medical device teams are racing to put AI to work. Drug discovery is being accelerated. Clinical trial analytics are being modernized. Quality control on the manufacturing line is being automated.

Best Practices Thought Leadership

AI-Ready Data Explained: The Pillars, Challenges, and Process

Einat Orr, PhD

May 26, 2026

AI-ready data is often misunderstood, dismissed as just another layer of hype on top of familiar practices like data quality. But that assumption misses something important. There is a real shift happening in how data needs to be prepared, structured, and managed to support modern AI systems. AI-ready data is critical to the success of

Best Practices

Pick up the Slack with lakeFS