Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Best Practices

Best Practices Tutorials

Data Collaboration: What Is It And Why Do Teams Need It?

Tal Sofer

One common problem data teams face today is how to avoid stepping on teammates’ toes when working in data environments. Data assets are often handled like a shared folder that anybody can access, edit, and write to. This causes conflicting changes or accidental overrides, leading to data inconsistencies or loss of work.   But this problem […]

Best Practices Tutorials

CI/CD Data Pipeline: Benefits, Challenges & Best Practices

Idan Novogroder

Continuous integration/continuous delivery (CI/CD) helps software developers adhere to security and consistency standards in their code while meeting business requirements. Today, CI/CD is also one of the data engineering best practices teams use to keep their data pipelines efficient in delivering high-quality data. What is a CI/CD pipeline and how do you implement it? Keep

Best Practices Tutorials

Unit Testing for Notebooks: Best Practices, Tools & Examples

Idan Novogroder

Quality can start from the moment you write your code in a notebook. Unit testing is a great approach to making the code in your notebooks more consistent and of higher quality.  In general, unit testing – the practice of testing self-contained code units, such as functions, frequently and early – is a good practice

Best Practices

dbt Data Quality Checks: Types, Benefits & Best Practices

Idan Novogroder

Decisions based on data can only make a positive impact as long as the data itself is accurate, consistent, and dependable. High data quality is critical, and data quality checks are a key part of handling data at your organization. This is where dbt comes in. dbt (data built tool) provides a complete framework for

Best Practices Machine Learning

Data Lake Implementation: 12-Step Checklist

Idan Novogroder

In today’s data-driven world, organizations face enormous challenges as data grows exponentially. One of them is data storage. Traditional data storage methods in analytical systems are expensive and can result in vendor lock-in. This is where data lakes come to store massive volumes of data at a fraction of the expense of typical databases or

Best Practices Machine Learning

Data Pipelines in Python: Frameworks & Building Processes

Amit Kesarwani

Data pipelines are critical for organizing and processing data in modern organizations. A data pipeline consists of linked components that process data as it moves through the system. These components may comprise data sources, write-down functions, transformation functions, and other data processing operations like validation and cleaning.  Pipelines automate the process of gathering, converting, and

Best Practices Machine Learning

Data Version Control for Hugging Face Datasets 

Idan Novogroder

Hugging Face Datasets (🤗 Datasets) is a library that allows easy access and sharing of datasets for audio, computer vision, and natural language processing (NLP). It takes only a single line of code to load a dataset and then use Hugging Face’s advanced data processing algorithms to prepare it for deep learning model training.  Data

Best Practices Data Engineering Tutorials

ETL Testing Tutorial with lakeFS: Step-by-Step Guide

Iddo Avneri

ETL testing is critical in integrating and migrating your data to a new system. It acts as a safety net for your data, assuring completeness, accuracy, and dependability to improve your decision-making abilities. ETL testing may be complex owing to the volume of data involved. Furthermore, the data is almost always varied, adding an extra

Git for Data – lakeFS

  • Get Started
    Get Started
  • Did you know that lakeFS is an official Databricks Technology Partner? Learn more about -

    lakeFS for Databricks
    +