Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community

Chaos Engineering – Managing Stages in a Complex Data Flow

Learn how to apply the principles of chaos engineering to make more resilient data pipelines.

Chaos Engineering – Managing Stages in a Complex Data Flow

Learn how to apply the principles of chaos engineering to make more resilient data pipelines.

Description

Details

We invite you to join the Data Engineering group monthly meeting for a presentation followed by group questions and responses. This is a virtual meeting and all are welcome to attend.

Chaos Engineering and how to manage data stages in Large-Scale Complex Data Flow
Presenter: Adi Polak

A complex data flow is a set of operations to extract information from multiple sources, copy them into multiple data targets while using extract, transformations, joins, filters, and sorts to refine the results. These are precisely the capabilities that the new open modern data stack provides us. Spark and other tools allow us to develop complex data flow on large-scale data. Chaos Engineering concepts discuss the principles of experimenting on a distributed system to build confidence in the system’s capability to withstand turbulent conditions in production. Or, how stable is your distributed system?

We tend to adopt practices that improve the flexibility of development and the velocity of code deployment, but how confident are we that the complex data system is safe once it arrives in production? We must be able to experiment in production, automate actions while minimizing customer pain and reducing damage for code and data. If your product’s value is derived from data in the shape of analytics or machine learning, losing it, or having corrupted data, can easily translate into pain. In this session, you will discover how chaos engineering principles apply to distributed data systems and the tools that enable us to make our data workloads more resilient. We will also show you how to recover from deploying code that resulted in corrupted data, which can happen with complex distributed data systems with many moving parts.

Adi’s bio:
As Vice President of Developer Experience at Treeverse, Adi builds the open-source project lakeFS, git-like operations for data lakes. In her work, she brings her vast industry research and engineering experience to bear in educating and helping teams design, architect, and build cost-effective data systems and machine learning pipelines that emphasize scalability, expertise, application lifecycle processes, team processes, and business goals. Adi is a frequent worldwide presenter, instructor, and the author of O’Reilly’s upcoming book, “Machine Learning With Apache Spark.” Previously, she was a senior manager for Azure at Microsoft, where she focused on building advanced analytics systems and modern architectures.
When Adi isn’t building data pipelines or thinking up new software architecture, you can find her on the local cultural scene or at the beach.

Contact dustin@dustinvannoy.com if you are interested in presenting at a future meetup.

Zoom Link for Meeting: RSVP for Link

Need help getting started?

Git for Data – lakeFS

  • Get Started
    Get Started