Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Data Engineering

Data Engineering Machine Learning

Shallow Copy For Data: What Are Your Options?

Idan Novogroder

In the past five years, we’ve seen many concepts and new tools in the data ecosystem contribute to implementing engineering best practices in data. This trend includes the data mesh, data quality testing, observability, and data monitoring.  The practices we would like to borrow from software engineering and use in data engineering and data science […]

Best Practices Data Engineering

Databricks Autoloader: Ingesting Data with Ease and Efficiency

Idan Novogroder

You can ingest data files from external sources using a variety of technologies, from Oracle and SQL Server to PostgreSQL and systems like SAP or Salesforce. When putting this data into your data lake, you might run into the issue of identifying new files and orchestrating processes. This is where Databricks Autoloader helps. Databricks Autoloader

Data Engineering Machine Learning Product Tutorials

Introducing The New lakeFS Python Experience

Oz Katz, Nir Ozeri

Since its inception, lakeFS shipped with a full featured Python SDK. For each new version of lakeFS, this SDK is automatically generated, relying on the OpenAPI specification published by the given version. While this always ensured the Python SDK shipped with all possible features, the automatically generated code wasn’t always the nicest (or most Pythonic)

Data Engineering Machine Learning

What is Databricks and How Does It Unify the Power of Data Science and Engineering?

Oz Katz

Data-driven decision-making has become the foundation of business operations across every type of company, no matter the size or industry. Large volumes of data flow from many source systems to data warehousing, data lake, or analytics solutions.  What companies need to maximize their ROI from data is a fast, dependable, scalable, and user-friendly space that

Data Engineering Machine Learning Tutorials

Unlocking Data Insights with Databricks Notebooks

Idan Novogroder

Databricks Notebooks are a popular tool for interacting with data using code and presenting findings across disciplines like data science, machine learning, and data engineering. Notebooks are, in fact, a key offering from Databricks for generating processes and collaborating with team members thanks to real-time multilingual coauthoring, automated versioning, and built-in data visualizations.  How exactly

Data Engineering

Tech Note: Iceberg Data Diff

Ariel Shaqed (Scolnicov)

Back in June we happily announced that lakeFS ♥️ Iceberg.  Since then our Iceberg support has been growing.  A new experimental feature now allows you to use a Spark SQL engine in order to compute “data diffs” between versions of an Iceberg table that are stored in two different lakeFS versions. Reminder: lakeFS versions are

Data Engineering Machine Learning Tutorials

AWS Trino and lakeFS Integration

Amit Kesarwani

A Step-by-Step Configuration Tutorial Introduction In today’s data-driven world, organizations are grappling with an explosion in the volume of data, compelling them to shift away from traditional relational databases and embrace the flexibility of object storage. Storing data in object storage repositories offers scalability, cost-effectiveness, and accessibility. However, efficiently analyzing or querying structured data in

Best Practices Data Engineering Tutorials

Databricks Unity Catalog: A Comprehensive Guide to Streamlining Your Data Assets

Oz Katz

As data quantities increase and data sources diversify, teams are under pressure to implement comprehensive data catalog solutions. Databricks Unity Catalog is a uniform governance solution for all data and AI assets in your lakehouse on any cloud, including files, tables, machine learning models, and dashboards. The solution provides a consolidated solution for categorizing, organizing,

Data Engineering Machine Learning

Jupyter Notebook & 10 Alternatives: Data Notebook Review [2023]

Einat Orr, PhD

The tech industry responded to the needs of data practitioners with various IDE solutions for developing code and presenting findings in a data science and machine learning context. One of the go-to solutions today is Jupyter Notebook, an open-source tool that has gained a lot of traction among data science folks and beyond.  Although Jupyter

Git for Data – lakeFS

  • Get Started
    Get Started
  • Where is data engineering heading in 2024? Find out in this year’s State of Data Engineering Report -

    Read it here
    +