Best Practices Product Thought Leadership

The Evolving Equation: When Do You Move From Open Source to Enterprise with Data Version Control

Tal Sofer

July 16, 2025

Open source software has fundamentally reshaped technology—delivering unmatched flexibility, low friction, and rapid innovation. For some teams, it’s a philosophical commitment. For others, it’s the fastest path to building. lakeFS supports both models. For most data teams, the journey starts with open source and evolves over time. lakeFS open source offers a robust foundation for […]

Best Practices Machine Learning

AI-Ready Data: Characteristics, Challenges & Best Practices

Tal Sofer

July 14, 2025

Despite the increasing adoption of Artificial Intelligence (AI) applications, most organizations are bound to see implementation challenges. One of the issues lies in the data itself. A recent survey showed 80% of companies believe their data is suitable for AI, but more than half are actually dealing with challenges like internal data quality and categorization

Best Practices Machine Learning Product Tutorials

A Single Pane of Glass to Your Data: Multiple Storage Backends Support in lakeFS

Tal Sofer

May 13, 2025

Today’s organizations don’t just use a single data storage solution – they operate across on-prem servers, multiple cloud providers, and hybrid environments. This distributed approach has become necessary, but it comes with significant costs: teams struggle with siloed tools, duplicated processes, and an endless cycle of environment management that diverts focus from delivering actual value.

Best Practices Data Engineering Machine Learning

What is AI infrastructure? Benefits & how to build one

Idan Novogroder

April 29, 2025

A solid AI infrastructure is essential for efficiently developing and deploying AI and machine learning (ML) applications – from facial and speech recognition to text processing and computer vision. Before we dive into why AI infrastructure is crucial and how it works, let’s define it first. What is AI infrastructure? AI infrastructure, also known as

Best Practices Data Engineering Machine Learning

6 Types of Metadata: Examples, Tools & Frameworks

Idan Novogroder

April 22, 2025

With the volumes of generated data increasing, metadata has become an essential component in organizing and comprehending massive datasets. Metadata plays a key role in any modern data strategy, especially among organizations that treat data as one of their most precious assets. This article dives into all the different metadata types, tools, and frameworks to

Best Practices Machine Learning

What is AI Data Storage? Benefits, Challenges & Best Practices

Tal Sofer

April 17, 2025

Many companies are modernizing their data storage infrastructure to capitalize on the opportunities of machine learning (ML) and advanced analytics. However, teams face several unique data management challenges such as the increasing time required for AI training and inference workloads, as well as the cost and scarcity and resources, particularly GPUs. Storage is a key

Best Practices Machine Learning Product

Iterative Fine-Tuning and Parallel Experiments with lakeFS

Barak Amar

April 14, 2025

Supercharging Machine Learning Machine learning (ML) is essential in driving critical business decisions and innovation across various industries. To maintain competitive advantages, organizations continually refine and enhance their ML models through iterative fine-tuning and parallel experimentation. While these strategies are powerful, they come with substantial challenges related to data management, reproducibility, and resource optimization. lakeFS

Best Practices Machine Learning

AI Agents in Business and Automation

Amit Kesarwani

April 2, 2025

This article discusses AI Agents in business and automation, focusing on building an AI Agent using lakeFS, LangChain, OpenAI, and FAISS (Facebook AI Similarity Search) to answer questions based on documents. It explains what AI Agents and LangChain are, and how lakeFS is used for data version control. The article also provides an example of

Best Practices Machine Learning

Metadata Management Tools: Types, Features & Benefits

Tal Sofer

March 25, 2025

Managing complex and massive data sets is tricky but metadata management tools can help teams keep their data in shape. Metadata management has become critical in data strategies created by organizations that treat data as an important asset. In this article, we dive into metadata management and give you an overview of tools teams use

Best Practices Machine Learning Product

Preprocessing Data Locally with Zero Copy Using lakeFS

Oz Katz

March 24, 2025

One of the capabilities of lakeFS is that you can use it to create isolated environments for experimentation or development. Let’s say we want to build a machine learning model and need to prepare or clean some data. With lakeFS, we can do this in isolation without creating an entire copy of the dataset. Let’s

Best Practices Machine Learning

What is Metadata? Examples, Benefits & Best Practices

Tal Sofer

February 26, 2025

What is the key element that guarantees all data published on portals is discoverable, comprehensible, reusable, and interoperable for people and technology like AI? You guessed right; it’s metadata. Metadata also plays a key role in data governance and management. According to Gartner, organizations that fail to adopt a metadata-driven strategy for IT modernization might

Best Practices Machine Learning Product

The Holy Trinity of ML Reproducibility

Oz Katz

February 25, 2025

Reproducibility is a fundamental challenge in building reliable machine learning (ML) models and AI applications. It’s not just about debugging a model when it fails in production; it’s also about ensuring that experiments are consistent, avoiding unintended variance, and making incremental progress with confidence. Without reproducibility, ML teams risk wasting time on unreliable results and

Best Practices

Pick up the Slack with lakeFS