Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Best Practices

Best Practices Product Tutorials

Introducing lakeFS Transactional Mirroring (Cross-Region Mirroring)

Ariel Shaqed (Scolnicov), Idan Novogroder, Guy Hardonag

What is mirroring We are pleased to announce a preview of a long-awaited lakeFS feature: transactional mirroring across regions. Mirroring builds on top of S3 Replication to provide a consistent view of your versioned data in other regions. Once configured, it allows creating mirrors in all of your regions. Each mirror of a source repository

Best Practices Machine Learning

What is LLMOps? Key Components & Differences to MLOPs

Idan Novogroder

Large Language Models (LLMs) are pretty straightforward to use when you’re prototyping. However, incorporating an LLM into a commercial product is an altogether different story. The LLM development lifecycle is made up of several complex components, including data intake, data preparation, engineering, model fine-tuning, model deployment, model monitoring, and more. The process also calls for

Best Practices

What are the business costs and risks of poor data quality?

Noy Davidson

Every year, poor data quality costs companies an average of $12.9 million. Aside from the immediate impact on income, low quality data complicates data ecosystems and contributes to poor decision-making in the long run.  In a world where data is the most central asset of a company, used both for operational and strategic purposes, we

Best Practices

Databricks Architecture Overview: Components & Key Features

Idan Novogroder

Many organizations today use a complex mix of data lakes and data warehouses to build the foundation for their data-driven processes. They run parallel pipelines for handling data in planned batches or streaming data in real time, often adding new tools for analytics, business intelligence, and data science.  Databricks was designed to reduce this complexity.

Best Practices Product

dbt + Databricks: What are they and how do they work together best?

Tal Sofer

It’s clear that the adoption of dbt is picking up, as it now supports major big data compute tools like Spark and Trino, as well as platforms like Databricks. Incidentally, these technologies are a common choice among our community members, who often use dbt and Databricks together to manage a data lake (or lakehouse) over

Best Practices Product

lakeFS Transactions: Maintain Data Integrity Using ACID Principles

Nir Ozeri

We recently introduced the new High Level Python SDK, which provides a friendlier interface to interact with lakeFS, as part of our evergoing effort to make life simpler for data professionals.  In this article, we will introduce you to a cool new addition to the High Level SDK: Transactions! Read on to learn what lakeFS

Best Practices Data Engineering

Databricks Autoloader: Ingesting Data with Ease and Efficiency

Idan Novogroder

You can ingest data files from external sources using a variety of technologies, from Oracle and SQL Server to PostgreSQL and systems like SAP or Salesforce. When putting this data into your data lake, you might run into the issue of identifying new files and orchestrating processes. This is where Databricks Autoloader helps. Databricks Autoloader

Best Practices Product

Pre-Signed URLs: How lakeFS Manages Data It Cannot Access

Oz Katz

In the world of data management, security is a paramount concern. The more data we generate and store, the more critical it becomes to ensure that data is both accessible and protected. lakeFS, a powerful and innovative data version control system, takes data security to the next level by offering a unique feature: the ability

Git for Data – lakeFS

  • Get Started
    Get Started