Paul Singman
November 1, 2021
Last week we held another lakeFS Community Call! We believe these calls are invaluable opportunities to have direct dialogue with our users on all things lakeFS.

Oz covered important new lakeFS functionality, previewed what's coming soon from the roadmap, and also shared two exciting updates from the community. Let's recap!

6 Important lakeFS Releases

1. post-merge and post-commit hooks (0.46)

lakefs post commit merge hooks notify

Now you can use the lakeFS hooks feature to run validation tests triggered after a commit or merge action takes place. Unlike with the “pre-” variety, this won’t prevent bad data from making it into production datasets. But it will you to have downstream systems respond to the events and take their own action – like Hive Metastore adding a partition or updating a data discovery tool.

2. Hooks support triggering Airflow DAGs (0.47)

With this feature, lakeFS post-merge hooks can trigger an Airflow DAG to run that, for example, transforms or aggregates data. With this architecture, Airflow goes from a sensor-driven architecture to an event-driven one. Cool stuff!

3. Support repositories on multiple AWS regions (0.48)

One lakeFS installation can now support repositories created with the underlying storage buckets in different regions. This is useful for larger orgs using lakeFS that want to use different regions for their data. And also in particular when using the native Spark client which talks to the data objects directly, instead of going through the lakeFS S3 Gateway.

4. Optional S3 Gateway DNS settings (0.48)

lakefs-server-architecture

Before this change a lakeFS installation required two DNS records – one pointing to the lakeFS OpenAPI and one to the S3 Gateway. We used the different hosts to determine if something was an S3 or API request.

Configuring the DNS records and explaining why it was necessary was often a challenge for new  users, so we’ve added logic to a Request Resolver component within lakeFS that automatically determines where requests should be routed. 

No multiple DNS records needed and a simplified deployment process!

5. lakeFS protected branches (0.52)

protected branches lakefs UI screenshot

The powerful lakeFS branching model just got even more powerful. Protected branches let you define branch protection rules that prevent a lakeFS user from committing directly or deleting a branch.

The guardrails for data lake users and guarantees you can ensure about your data with this feature are pretty neat!

6. Support for LDAP Authentication (0.53)

No more syncing a user database or having to manage users internally in lakeFS. Tap into an existing LDAP server to re-use user settings in lakeFS. 

Watch the full event replay below!

The Road Ahead

Want to know what’s coming in future lakeFS releases? We covered four of the ones we’re most excited about.

  1. Improved UI collaboration – Improved viz for diffs, aggregated stats per prefix, and line-diffs for human-readable formats (like CSV)
  2. Tracking changes at the collection or object level – Similar to the way git log -- some/path lets you see who made a change or how has this file changed over time
  3. Integration with dbt – We’ve heard  of two challenges dbt users face where lakeFS would improve the usage: 1. Creating isolated data environments without copying data and 2. Running dbt test before data is exposed to consumers. More to come here!
  4. lakeFS Metastore – As we’ve made clear, the current metastore solution from Hive isn’t cutting it. Creating a modern, version-aware metastore within lakeFS is part of our future vision.

Community Updates

First we heard from Prafful at Volvo shared how they are utilizing lakeFS to provide the data science teams the maximum isolation possible with data versioning

Volvo developed a solution using versioning from lakeFS with Git branching. Every git branch points to a specific commit in lakeFS.

This gives them an important way to apply development best practices for the data scientists teams, without creating new complexity.

Second, Oz announced our upcoming panel on Hive Metastore. It’s taking place virtually on November 10th with a great lineup of panelists. See the event page for more details and to tune in!

Thank you for reading! If you would like to stay in touch with the community...

Read Related Articles.

The Guide to Data Versioning

“I have never lied to you, I have always told you some version of the truth.” “The truth doesn’t have versions, okay?” — Something’s Gotta Give (2003) Jack Nicholson

Read More »

LakeFS

  • Get Started
    Get Started