Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Data Engineering

Data Engineering Machine Learning

What is Databricks and How Does It Unify the Power of Data Science and Engineering?

The lakeFS team

Data-driven decision-making has become the foundation of business operations across every type of company, no matter the size or industry. Large volumes of data flow from many source systems to data warehousing, data lake, or analytics solutions.  What companies need to maximize their ROI from data is a fast, dependable, scalable, and user-friendly space that …

What is Databricks and How Does It Unify the Power of Data Science and Engineering? Read More »

Data Engineering Machine Learning Tutorials

Unlocking Data Insights with Databricks Notebooks

Idan Novogroder

Databricks Notebooks are a popular tool for interacting with data using code and presenting findings across disciplines like data science, machine learning, and data engineering. Notebooks are, in fact, a key offering from Databricks for generating processes and collaborating with team members thanks to real-time multilingual coauthoring, automated versioning, and built-in data visualizations.  How exactly …

Unlocking Data Insights with Databricks Notebooks Read More »

Data Engineering

Tech Note: Iceberg Data Diff

Ariel Shaqed (Scolnicov)

Back in June we happily announced that lakeFS ♥️ Iceberg.  Since then our Iceberg support has been growing.  A new experimental feature now allows you to use a Spark SQL engine in order to compute “data diffs” between versions of an Iceberg table that are stored in two different lakeFS versions. Reminder: lakeFS versions are …

Tech Note: Iceberg Data Diff Read More »

Data Engineering Machine Learning Tutorials

AWS Trino and lakeFS Integration

Amit Kesarwani

A Step-by-Step Configuration Tutorial Introduction In today’s data-driven world, organizations are grappling with an explosion in the volume of data, compelling them to shift away from traditional relational databases and embrace the flexibility of object storage. Storing data in object storage repositories offers scalability, cost-effectiveness, and accessibility. However, efficiently analyzing or querying structured data in …

AWS Trino and lakeFS Integration Read More »

Best Practices Data Engineering Tutorials

Databricks Unity Catalog: A Comprehensive Guide to Streamlining Your Data Assets

The lakeFS team

As data quantities increase and data sources diversify, teams are under pressure to implement comprehensive data catalog solutions. Databricks Unity Catalog is a uniform governance solution for all data and AI assets in your lakehouse on any cloud, including files, tables, machine learning models, and dashboards. The solution provides a consolidated solution for categorizing, organizing, …

Databricks Unity Catalog: A Comprehensive Guide to Streamlining Your Data Assets Read More »

Data Engineering Machine Learning

Jupyter Notebook & 10 Alternatives: Data Notebook Review [2023]

The lakeFS team

The tech industry responded to the needs of data practitioners with various IDE solutions for developing code and presenting findings in a data science and machine learning context. One of the go-to solutions today is Jupyter Notebook, an open-source tool that has gained a lot of traction among data science folks and beyond.  Although Jupyter …

Jupyter Notebook & 10 Alternatives: Data Notebook Review [2023] Read More »

Best Practices Data Engineering Tutorials

How Data Version Control Provides Data Lineage for Data Lakes

Iddo Avneri

One of the reasons behind the rise in data lakes’ adoption is their ability to handle massive amounts of data coming from diverse data sources, transform it at scale, and provide valuable insights. However, this capability comes at the price of complexity.  This is where data lineage helps. In this article, we review some basic …

How Data Version Control Provides Data Lineage for Data Lakes Read More »

Data Engineering Tutorials

Prefect + lakeFS: How to Troubleshoot Data Pipelines and Reproduce Data

Amit Kesarwani

Prefect is a workflow orchestration tool empowering developers to build, observe, and react to data pipelines. It’s the easiest way to transform any Python function into a unit of work that can be observed and orchestrated. Prefect offers several key components to help users build and run their data pipelines, including Tasks and Flows. With …

Prefect + lakeFS: How to Troubleshoot Data Pipelines and Reproduce Data Read More »

Data Engineering Machine Learning Product

Scalable Data Version Control – Getting the Best of Both Worlds with lakeFS

Oz Katz

There are several tools in the data version control space, all looking to solve similar problems. Two of the leaders are lakeFS and DVC. In this post, I am going to give an overview of how each has been designed so as to provide a basis for understanding their relative abilities to scale. Being a …

Scalable Data Version Control – Getting the Best of Both Worlds with lakeFS Read More »

Git for Data – lakeFS

  • Get Started
    Get Started
  • Create a Dev/Test Environment for Data Pipelines Using Spark and Python in this LIVE WEBINAR -

    Register here
    +