Webinar Lottie

lakeFS Acquires DVC, Uniting Data Version Control Pioneers to Accelerate AI-Ready Data

webcros

Best Practices

Best Practices Product Tutorials

Introducing lakeFS Transactional Mirroring (Cross-Region Mirroring)

Ariel Shaqed (Scolnicov), Idan Novogroder, Guy Hardonag

What is mirroring We are pleased to announce a preview of a long-awaited lakeFS feature: transactional mirroring across regions. Mirroring builds on top of S3 Replication to provide a consistent view of your versioned data in other regions. Once configured, it allows creating mirrors in all of your regions. Each mirror of a source repository […]

Best Practices Machine Learning

What is LLMOps? Key Components & Differences to MLOPs

Idan Novogroder

Large Language Models (LLMs) are pretty straightforward to use when you’re prototyping. However, incorporating an LLM into a commercial product is an altogether different story. The LLM development lifecycle is made up of several complex components, including data intake, data preparation, engineering, model fine-tuning, model deployment, model monitoring, and more. The process also calls for

Best Practices

The cost of poor data quality on business operations

Noy Davidson

As a business owner, data specialist, or business intelligence (BI) analyst, you’re likely aware of the critical importance of accurate data in making well-informed strategic decisions. There’s a reason why data practitioners spend the majority of their time preparing the data for analysis. This article explores how bad data affects company results and offers tactics

Best Practices

Databricks Architecture Overview: Components & Key Features

Idan Novogroder

Many organizations today use a complex mix of data lakes and data warehouses to build the foundation for their data-driven processes. They run parallel pipelines for handling data in planned batches or streaming data in real time, often adding new tools for analytics, business intelligence, and data science.  Databricks was designed to reduce this complexity.

Best Practices Product

dbt + Databricks: What are they and how do they work together best?

Tal Sofer

It’s clear that the adoption of dbt is picking up, as it now supports major big data compute tools like Spark and Trino, as well as platforms like Databricks. Incidentally, these technologies are a common choice among our community members, who often use dbt and Databricks together to manage a data lake (or lakehouse) over

Best Practices Product

lakeFS Transactions: Maintain Data Integrity Using ACID Principles

Nir Ozeri

We recently introduced the new High Level Python SDK, which provides a friendlier interface to interact with lakeFS, as part of our evergoing effort to make life simpler for data professionals.  In this article, we will introduce you to a cool new addition to the High Level SDK: Transactions! Read on to learn what lakeFS

Best Practices Data Engineering

Databricks Autoloader: Ingesting Data with Ease and Efficiency

Idan Novogroder

You can ingest data files from external sources using a variety of technologies, from Oracle and SQL Server to PostgreSQL and systems like SAP or Salesforce. When putting this data into your data lake, you might run into the issue of identifying new files and orchestrating processes. This is where Databricks Autoloader helps. Databricks Autoloader

Best Practices Product

Pre-Signed URLs: How lakeFS Manages Data It Cannot Access

Oz Katz

In the world of data management, security is a paramount concern. The more data we generate and store, the more critical it becomes to ensure that data is both accessible and protected. lakeFS, a powerful and innovative data version control system, takes data security to the next level by offering a unique feature: the ability

Best Practices

Getting Started with Databricks Lakehouse: The Future of Data Management

Idan Novogroder

The data industry has long been waiting for a solution that would integrate the data structures and data management functions of data warehouses directly into the type of low-cost storage utilized for data lakes. Enter Databricks lakehouse, an architecture that does just that. By merging the best from data warehouses and data lakes, a lakehouse

Best Practices Tutorials

The Power of Databricks SQL: A Practical Guide to Unified Data Analytics

Oz Katz

In the universe of Databricks Lakehouse, Databricks SQL serves as a handy tool for querying and analyzing data. It lets SQL-savvy data analysts, data engineers, and other data practitioners extract insights without forcing them to write code. This improves access to data analytics, simplifying and speeding up the data analysis process.  But that’s not everything

lakeFS