Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Best Practices

Best Practices Machine Learning

RAG as a Service: Benefits, Use Cases & Challenges

Idan Novogroder

Retrieval Augmented Generation (RAG) is on its way to becoming the dominant framework for implementing enterprise applications based on Large Language Models (LLMs). However, implementing RAG on your own is tricky. The framework calls for a high degree of knowledge and skill, as well as ongoing investment in DevOps and MLOps. Not to mention staying […]

Best Practices

Apache Iceberg Catalogs: Types & How to Choose the Right Catalog

Tal Sofer

Apache Iceberg is the most popular open table format. It originated at Netflix due to the need to provide a table representation for data saved in files and to enable teams to work with those tables as if they were managed in a relational database. In broad terms, Apache Iceberg is constructed of three main

Best Practices Machine Learning

Machine Learning Model Versioning: Top Tools & Best Practices

Einat Orr, PhD

Developing a machine learning application is a complex process that involves steps such as processing massive volumes of data, testing multiple ML models, parameter optimization, feature tuning, and others. This is why data version control is critical in the ML environment. If you want your experiments and data to be reproducible, you need to use

Best Practices Machine Learning

LLM Observability Tools: 2026 Comparison

Einat Orr, PhD

As OpenAI unveiled ChatGPT, which swiftly explained difficult problems, carved sonnets, and discovered errors in code, the usefulness and adaptability of LLMs became clear. Soon after, companies across various sectors began exploring new use cases, testing generative AI capabilities and solutions, and incorporating these LLM processes into their engineering environments.  Whether it’s a chatbot, product

Best Practices Product Tutorials

Metadata Enforcement: Step-by-Step Tutorial

Amit Kesarwani

Metadata enforcement is a broad term that can refer to different aspects of managing and controlling metadata. Let’s explore few key areas: Understanding Metadata Enforcement 1 – Data Privacy and Protection: 2 – Data Governance and Quality: 3 – Legal and Compliance Challenges in Metadata Enforcement Strategies for Effective Metadata Enforcement We will focus on

Best Practices Tutorials

Delta Time Travel in Databricks: How It Works

Tal Sofer

Databricks Delta Lake includes a number of time travel features to let you access any previous version of the extensive data that Delta automatically versions and stores in your data lake. This makes it simple to audit, roll back data in the event of unintentional poor writes or deletes, and reproduce reports and trials. How

Best Practices Tutorials

What Is Write-Audit-Publish and Why Should You Care?

Einat Orr, PhD

The Write-Audit-Publish (WAP) pattern in data engineering gives teams greater control over data quality. But what does it entail, and how do you implement it? Keep reading to learn more about the Write-Audit-Publish pattern, examine its use cases, and get a practical implementation example. What is Write-Audit-Publish all about? Write-Audit-Publish (WAP) aims to boost trust

Best Practices Product

I Already Have Time Travel with Delta Tables, Why Do I Need lakeFS?

Iddo Avneri

When Databricks users first hear about lakeFS, a common response is, “I already have time travel in Delta Tables.” This raises an important question: how is lakeFS better, or how can it complement Delta Tables? Let’s explore the key differences and use cases where lakeFS shines, explaining why thousands of organizations, including many large enterprises,

Best Practices

What is Snowflake Data Catalog? Its Benefits & How to Set It Up

Idan Novogroder

Snowflake has many advantages, but its security and scalability are arguably the leading magnets for data practitioners. More and more businesses are migrating their data to Snowflake from big data systems like Teradata and Hadoop.  A single Snowflake account can include up to ten databases, each with thousands of Views, Tables, and Columns. To address

Best Practices Tutorials

Power Up Your Lakehouse with Git Semantics and Delta Lake

Oz Katz

The lakehouse architecture has become the backbone of modern big data operations, but it comes with specific issues.  The challenge of data versioning arises in various DataOps areas, including:  Fortunately, open-source tools can help overcome these issues.  In this article, we’ll demonstrate how by implementing Git-like semantics, Delta Lake and lakeFS can work together to

Best Practices

Building A Management Layer For Your Data Lake: 3 Practical Examples with Databricks, AWS, and Snowflake

Einat Orr, PhD

This article is the continuation of Building A Management Layer For Your Data Lake: 3 Architecture Components. In this part, we explore open table formats, metastores, and data version control across three practical examples showing how to build a management layer for data lakes using tools in the Databricks, AWS, and Snowflake ecosystems. Databricks ecosystem

Best Practices

Building A Management Layer For Your Data Lake: 3 Architecture Components

Einat Orr, PhD

The growth in amounts of data was the catalyst for replacing traditional analytics databases with data lakes. While data lakes were able to handle large amounts of data, they did not provide us with all the capabilities of an analytics database… But we did not succumb to this tradeoff, and a set of technologies emerged

lakeFS