Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community
The lakeFS Team
The lakeFS Team Author

lakeFS is on a mission to simplify the lives of...

Last updated on June 20, 2024

What is lakeFS?

lakeFS is a platform that helps data engineers build scalable and resilient data lakes running on object storage. It provides version control, branching, and merging capabilities for data at petabyte scale, on or off premises. lakeFS enables teams to collaborate and manage data effectively by applying engineering best practices to data management.

lakeFS was created by Treeverse and it is an open-source project under the Apache 2.0 license. In addition to the open-source project, lakeFS is available as a SaaS offering or an Enterprise software solution. 

This article will review the differences between the 3 options. 

Open-Source lakeFS

As mentioned, lakeFS is an open-source software (OSS). We chose the open-core model; meaning the core of a software product is released as OSS, while additional features and functionalities are provided as commercial add-ons under a proprietary license. (You might be familiar with this model from companies like GitLab, dbt, Confluent, Databricks, Dagster, Preset and others).

Our commitment to open source is to keep lakeFS’s versioning capabilities, git-like interface, public object store APIs, CLI, API, and GUI interfaces in the open source. 

In other words, you will be able to use all the versioning capabilities at scale for your data using lakeFS. 

This includes:

  • Data Version Control 
  • Format Agnostic
  • Zero Clone Copy utilizing Branches, creating isolated environments
  • Atomic promotion of data utilizing Merges
  • Enhance your data CI/CD with lakeFS Hooks: configure actions to trigger when predefined events occur
  • A loooooong list of out-of-the-box Integrations, such as (but not limited to):
    • Spark
    • Databricks
    • Delta Lakes
    • AWS Glue
    • AWS EMR
    • Dremio
    • Trino
    • Presto
    • Airflow
    • KubeFlow
    • Python
    • DuckDB
    • Amazon Sagemaker
    • Kafka
    • dbt
    • And more…

Keep your storage cost low by:

  • lakeFS zero-clone copy
  • Deduplication of files on the object store over time
  • Garbage collection

Get started with lakeFS OSS today

lakeFS Enterprise

In addition to all the benefits of the lakeFS OSS solution, the Enterprise version grants customers access to some enterprise features as well as support SLA and customer success services. 

Enterprise Features

SSO (OIDC Integration)

Achieve Single Sign On (SSO) using the Active Directory Federation Services (AD FS) (using SAML), LDAP or OpenID Connect (OIDC) protocol built on top of the OAuth 2.0 framework.

With SSO OIDC integration, users can authenticate once with the identity provider (IDP) of their choice, and then gain access to lakeFS without having to enter their login credentials again. The IDP acts as a trusted third-party that vouches for the user’s identity, and the application can verify the users’ identity by validating the authentication token issued by the IDP.

SSO OIDC integration provides a more secure and user-friendly experience for users, as they only have to remember one set of credentials, reducing the burden on your IT department by simplifying user management and access control.

RBAC 

lakeFS Enterprise enables Role-Based Access Control (RBAC), which is achieved by a mechanism similar to AWS Identity and Access Management (IAM). 

IAM allows for centralized management of user identities and access privileges, enabling administrators to easily assign roles and permissions based on job responsibilities and organizational hierarchy. This enhances security by ensuring that users have access only to the resources they need to perform their jobs, and reduces the risk of unauthorized access or accidental data breaches. 

lakeFS authentication enables critically required granularity to control data access for different users, or groups of users, to different resources, down to an individual object or dataset in a repository. 

Overall, using IAM for RBAC improves security, streamlines access management, and enhances compliance with regulations and standards.

Customer Success

The lakeFS Customer Success team will ensure you are successful in achieving your desired outcomes and goals throughout the use of lakeFS. We take a proactive approach to maximize the value of lakeFS.

Some of our services include:

Onboarding

We will assist in the initial install, integration and setup of lakeFS. Our experts will walk you through:

  • lakeFS basic training
  • Deployment sizing and planning exercise
  • Use case review and optimization

Quarterly technical reviews

Once a quarter, we will discuss new feature releases and how you can benefit from them. 

Our customers also provide input on the roadmap, helping to shape future features (including but not limited to the OSS).

Enterprise Support SLA

We are committed to providing high-quality support and will work to resolve any issues that arise in a timely manner. This can help to minimize downtime and disruptions to business operations, which can be particularly important for mission-critical data applications. 

Our lakeFS enterprise customers will be notified in case of security threats or known bugs that will impact them, and we will proactively assist in resolving issues even before they are “discovered” by our customers. 

Our support SLA helps you plan and budget for support costs more effectively, as you clearly understand the expected response times and resolution times. lakeFS support continuously receives top customer satisfaction scores for every issue, reflecting our commitment to making your satisfaction our priority.

Get in touch to learn more about lakeFS Enterprise

lakeFS Cloud

lakeFS Cloud is a hosted lakeFS Enterprise Software as a Service (SaaS) solution that includes some additional features to lakeFS Enterprise. 

Why consume lakeFS as SaaS?

First, lakeFS Cloud reduces the time and cost required to deploy and maintain the software, enabling you to focus on unlocking the full benefits of using lakeFS. Additionally, lakeFS Cloud offers out-of-the-box scalability and flexibility, as the service will auto-scale to meet your data management needs. lakeFS Cloud will automatically update and upgrade versions, ensuring you always have access to the latest features and security patches without needing to perform manual upgrades. 

Overall, utilizing the service drives cost savings, flexibility, scalability, and ease of maintenance, making it our customer’s most popular choice.

Additional Features of lakeFS Cloud

Private Link

Leverage a dedicated connection within your cloud environment for enhanced security, improved network control, and potential cost savings. This bypasses the public internet, minimizing the risk of unauthorized access and ensuring data stays securely confined within your trusted virtual network

Managed Garbage Collection

Managing ephemeral data objects can be challenging. This is why we developed Garbage Collection. Garbage collection (GC) rules in lakeFS define for how long to retain objects after they have been deleted (learn more here). lakeFS OSS provides a Spark program to hard-delete objects that have been deleted and whose retention period has ended according to the GC rules. Using OSS, you will need to configure, run, troubleshoot and maintain the GC execution.

With lakeFS Cloud, GC is a fully transparent managed service. In other words, once you define your GC rules, lakeFS Cloud will automatically and continuously manage the execution of the garbage collection. This keeps your storage costs low, while simultaneously allowing you to roll back your data according to your policies.

SOC2 Compliance

lakeFS Cloud meets rigorous security standards and is designed to protect sensitive data and systems from unauthorized access, theft, or misuse. Choosing a SOC2 compliant software helps our customers comply with regulatory requirements, demonstrate due diligence to their customers and stakeholders, and mitigate the risk of data breaches and cyber-attacks.

Compute cost savings with lakeFS Cloud

Similar to lakeFS OSS and Enterprise, using lakeFS Cloud, the data stays in place. This means all your files, including the metadata files which lakeFS manages, sit on your own buckets (S3, Azure Blob, Google Storage) within your Virtual Private Cloud / Network. 

However, with lakeFS Cloud, the compute required for the lakeFS Server runs outside your account, saving you costs with your cloud vendor. Furthermore, additional compute operations such as garbage collection executions don’t run within your account, saving you additional costs. 

Start your lakeFS Cloud free trial today (no Credit Card or commitment needed)

Summary

lakeFS OSS is used by hundreds of organizations (that we know of) today. We are committed to continuing support and further improving our open-source solution. Having said that, you might be looking for additional features to allow your organization to maximize the benefits of lakeFS.  Below is a table comparing the three solutions, side by side:

lakeFS Open Source lakeFS Enterprise lakeFS Cloud (SaaS)
Format-agnostic data version control
Cloud-agnostic
Zero-Clone copy for isolated environment (via branches)
Atomic data promotion (via merges)
Data stays in place
Configurable garbage collection
Data CI/CD using lakeFS Hooks
Integrates with your data stack
Role-Based Access Control (RBAC)
Single Sign On (SSO)
SCIM support
IAM Roles
Audit Logs
Transactional Mirroring (cross-region mirroring)
Managed service (auto updates, auto-scaling, disaster recovery, etc.)
Managed Garbage Collection
SOC2 Compliant
Support SLA

Git for Data – lakeFS

  • Get Started
    Get Started
  • Who’s coming to Data+AI Summit? Meet the lakeFS team at Booth #69! Learn more about -

    lakeFS for Databricks
    +