Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Learn from AI, ML & data leaders

March 31, 2026  |  Live

Machine Learning

Machine Learning

Hugging Face Datasets Need Data Version Control – And So Do You

Idan Novogroder

Hugging Face acquired Xethub to build an internal data version control system. XetHub is a platform for collaborative development created by former Apple researchers in 2021 to improve the efficiency of machine learning teams that deal with huge datasets and models. The solution provides Git-like version management for up to TB-sized repositories, facilitating team collaboration,

Machine Learning

MLflow Data Versioning: Techniques, Tools & Best Practices

Amit Kesarwani

Data versioning is a central aspect of modern data management, especially in the context of GenAI and machine learning. Teams need a solution to version both their data and models. By keeping track of various iterations of datasets and models, they can manage changes smoothly and ensure the reproducibility of results.  MLflow has become a

Best Practices Machine Learning

Top 9 RAG Tools to Boost Your LLM Workflows

Idan Novogroder

A team looking to build an application that uses a large language model (LLM) like OpenAI’s GPT-4 or Meta’s LLama 2 will inevitably run into this issue: How can we ensure that the responses generated by these models align with the specific business context? This is where retrieval augmented generation (RAG) comes in. RAG brings

Machine Learning Product

Amazon S3 Mountpoint vs lakeFS Mount

Amit Kesarwani

What is a mount? A filesystem mount is the ability to present a local device or a remote location as a local directory. It is a basic feature provided by all operating systems and is widely used by system admins and developers. Let’s break down the differences between Mountpoint for Amazon S3 and lakeFS Mount:

Best Practices Machine Learning

RAG as a Service: Benefits, Use Cases & Challenges

Idan Novogroder

Retrieval Augmented Generation (RAG) is on its way to becoming the dominant framework for implementing enterprise applications based on Large Language Models (LLMs). However, implementing RAG on your own is tricky. The framework calls for a high degree of knowledge and skill, as well as ongoing investment in DevOps and MLOps. Not to mention staying

Best Practices Machine Learning

Machine Learning Model Versioning: Top Tools & Best Practices

Einat Orr, PhD

Developing a machine learning application is a complex process that involves steps such as processing massive volumes of data, testing multiple ML models, parameter optimization, feature tuning, and others. This is why data version control is critical in the ML environment. If you want your experiments and data to be reproducible, you need to use

Best Practices Machine Learning

LLM Observability Tools: 2026 Comparison

Einat Orr, PhD

As OpenAI unveiled ChatGPT, which swiftly explained difficult problems, carved sonnets, and discovered errors in code, the usefulness and adaptability of LLMs became clear. Soon after, companies across various sectors began exploring new use cases, testing generative AI capabilities and solutions, and incorporating these LLM processes into their engineering environments.  Whether it’s a chatbot, product

Best Practices Machine Learning Tutorials

MLflow on Databricks: Benefits, Capabilities & Quick Tutorial

Amit Kesarwani

Machine learning teams face many hurdles, from data sources with missing values to experiment reproducibility issues. MLflow is a tool that makes this easier. And Databricks makes working with it even more straightforward, thanks to its managed MLflow offering.  Managed MLflow expands the capabilities of MLflow, with an emphasis on dependability, security, and scalability. Keep

Best Practices Machine Learning Tutorials

RAG Pipeline: Example, Tools & How to Build It

Idan Novogroder

It may be tempting to think large language models (LLMs) can provide commercial value without any additional work, but this is a rare case. Businesses can make the most of these models by adding their own data. To do this, teams can use a technique called retrieval augmented generation (RAG). What is a RAG pipeline

Best Practices Machine Learning

Data Lake Implementation: 12-Step Checklist

Idan Novogroder

In today’s data-driven world, organizations face enormous challenges as data grows exponentially. One of them is data storage. Traditional data storage methods in analytical systems are expensive and can result in vendor lock-in. This is where data lakes come to store massive volumes of data at a fraction of the expense of typical databases or

We use cookies to improve your experience and understand how our site is used.

Learn more in our Privacy Policy