Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Machine Learning

Best Practices Machine Learning

Data Lake Implementation: 12-Step Checklist

Idan Novogroder

In today’s data-driven world, organizations face enormous challenges as data grows exponentially. One of them is data storage. Traditional data storage methods in analytical systems are expensive and can result in vendor lock-in. This is where data lakes come to store massive volumes of data at a fraction of the expense of typical databases or […]

Best Practices Machine Learning

Data Pipelines in Python: Frameworks & Building Processes

Amit Kesarwani

Data pipelines are critical for organizing and processing data in modern organizations. A data pipeline consists of linked components that process data as it moves through the system. These components may comprise data sources, write-down functions, transformation functions, and other data processing operations like validation and cleaning.  Pipelines automate the process of gathering, converting, and

Machine Learning Tutorials

How to Build Data Pipelines in Databricks with Examples

Tal Sofer

Building a data pipeline is a smart move for data engineers in any organization. A strong data pipeline guarantees that the information is clean, consistent, and dependable. It automates discovering and fixing issues, ensuring high data quality and integrity and preventing your company from making poor decisions based on inaccurate data. This article dives into

Data Engineering Machine Learning Thought Leadership

The State of Data Engineering 2024

Einat Orr, PhD

Since 2021 we’ve been releasing the annual State of Data Engineering Report, a compilation of all the relevant categories that have a direct impact on data engineering infrastructure. In 2024, we see 3 primary trends that influence the categories which will be covered in this report. Trend #1: GenAI influence on software infrastructure As predicted

Data Engineering Machine Learning

Top 15 Data Catalog Tools in 2024

Idan Novogroder

Many businesses are dealing with increasing volumes of data spread over several databases and repositories across on-premises systems, cloud services, and IoT technology. This complicates data management and data quality, preventing data practitioners from locating important data and unlocking insights from it.  This is where data catalogs come in. Initially, data catalogs required bespoke scripts

Best Practices Machine Learning

Data Version Control for Hugging Face Datasets 

Idan Novogroder

Hugging Face Datasets (🤗 Datasets) is a library that allows easy access and sharing of datasets for audio, computer vision, and natural language processing (NLP). It takes only a single line of code to load a dataset and then use Hugging Face’s advanced data processing algorithms to prepare it for deep learning model training.  Data

Machine Learning

26 MLOps Tools for 2024: Key Features & Benefits

Einat Orr, PhD

MLOps is a method for managing machine learning projects at scale. It improves collaboration across development, operations, and data science teams to accelerate model deployment, increase team productivity, and reduce risk and costs. This article dives into the top MLOps tools for model creation, deployment, and monitoring that help teams standardize, simplify, and streamline their

Machine Learning Product Tutorials

lakectl local: How to work with lakeFS locally using Git

Oz Katz

The massive increase in generated data presents a serious challenge to organizations looking to unlock value from their data sets. Data practitioners have to deal with many consequences of the huge data volume, including manageability and collaboration. This is where data versioning can help. Data version control is crucial because it allows data teams to

Data Engineering Machine Learning Tutorials

Building A Data Lake For The GenAI And ML Era

Einat Orr, PhD

Despite data technology advancements, many organizations still struggle to access outdated mainframe data. Most of the time, you’re looking at siloed data architecture that just doesn’t align with their strategic goals. At the same time, organizations are under pressure from their competitors. A good data strategy enables companies to go beyond function-specific and interdepartmental analytics

Data Engineering Machine Learning

Data Pipeline Automation: Benefits, Use Cases & Tools

Idan Novogroder

Data is the lifeblood of any business. It drives decision-making, powers strategies, and boosts customer relationships. However, due to the enormous volume of data collected or its poor quality, most businesses still struggle to unlock its value. With the right data pipeline automation system in place, teams can clean and prepare data to improve your

Git for Data – lakeFS

  • Get Started
    Get Started
  • Did you know that lakeFS is an official Databricks Technology Partner? Learn more about -

    lakeFS for Databricks
    +