Ready to dive into the lake?
lakeFS is currently only
available on desktop.

For an optimal experience, provide your email below and one of our lifeguards will send you a link to start swimming in the lake!

lakeFS Community
Idan Novogroder
Idan Novogroder Author

Idan has an extensive background in software and DevOps engineering....

Last updated on April 30, 2024

Machine learning solutions come in handy for addressing various problems and achieving a wide range of goals. However, if we look at ML applications from a distance, we’ll instantly see that the fundamental components are almost always the same. 

Whether you want to better understand the skeleton of machine learning solutions or start building your own, knowing these components and how they interact can assist.

Keep reading to discover the key elements of ML architecture and their representation in the form of an ML architecture diagram.

What is a Machine Learning Architecture Diagram?

Machine learning architecture refers to the structure and organization of all the components and processes that make up a machine learning system, from data preparation for machine learning applications to their deployment and maintenance. 

A machine learning architecture defines how data is processed, models are trained and evaluated, and predictions are generated. It provides a blueprint for creating an ML system. A well-designed ML architecture helps teams build scalable, dependable, and efficient machine learning systems.

What about the ML architecture diagram?

A machine learning architecture diagram provides a high-level overview of the numerous components you need to build a machine learning application. The image below is probably the most common diagram ML folks refer to, as it’s from one of the most widely cited ML papers.

Hidden Technical Debt in ML Systems
Source: Pinterest

Machine Learning Architecture Diagram Elements

Data Collection and Storage

This component contains a wide range of raw data sources, including databases, data lakes, and APIs. It also involves the step of gathering information from many sources and storing it in a single location for processing.

Data Version Control

During the creation of a machine learning model, we generate artifacts other than source code. Any file that is either input data or an output from a process can be considered an artifact.

Data version control applies not only to the data we feed into the ML model but also to the models themselves. Data version control and model versioning refer to the version management of data in machine learning – i.e., the process of storing, recording, and managing changes in a dataset.

Data Preprocessing

This component is all about data processing: cleansing, feature engineering, and data normalization. Data preparation is critical for improving data quality and ensuring its applicability for analysis.

Model Training and Tuning

This step involves selecting the optimal method, training the model, and fine-tuning the hyperparameters. The goal is to build a model that accurately predicts outcomes and generalizes well to new inputs.

Model Deployment and Monitoring

This component involves deploying machine learning models in a production environment and regularly monitoring their performance. This helps us to discover any issues and ensure that the model operates as intended.

Teams often use Docker to package ML models for deployment. To monitor the model in production, they often use the ideal open-source combination of Prometheus and Grafana.

User Interface

This component includes the interface via which users interact with the model to obtain its output – for example, a text or image generated in response to a prompt. A dashboard, mobile app, or online application may be used.

Iteration and Feedback

This part is about gathering user feedback to improve the model’s performance via retraining.

Machine learning architecture diagram
An example of a cloud-agnostic architecture diagram.

Machine Learning Lifecycle Components

Component Definition
Model development The model development process consists of training, fine-tuning, and assessment.
Model deployment The model deployment process comprises a staging environment for model validation to ensure security and robustness.
Model monitoriing Monitoring is critical for prompt discovery and control of drifts. Feedback loops across the ML lifecycle phases are critical for monitoring.
Feature stores Feature stores (online and offline) offer consistent and reusable features throughout the model development and deployment process
Model registry The model registry offers version control and lineage tracking of model and data components.

Let’s dive into the details to learn more about these and other components.

Online/Offline Feature Store

An online/offline feature store eliminates duplication and the requirement to rerun feature engineering code across teams and projects. An online shop with low-latency retrieval is perfect for real-time inference. It also comes in handy for training and batch scoring, as it stores a history of feature values.

Model Registry

A model registry is a repository for storing ML model artifacts, such as trained models and associated information for model lifecycle management. It allows for the monitoring of the lineage of ML models by acting as a version control system.

Performance Feedback Loop

The performance feedback loop informs the iterative data preparation phase based on the model’s assessment throughout the development phase.

Model Drift Feedback Loop

Model drift drives ML feedback loops, calling for the analysis and revisitation of monitoring and retraining procedures over time. ML feedback loops allow for experimenting with data augmentation, as well as various algorithms and training methodologies until an ideal result is found.

Alarm Manager

It receives signals from the model monitoring system and then distributes notifications to services that can send alerts to specific apps. The model update re-training pipeline is one example of a target application.

Scheduler

The scheduler starts the model retraining at business-defined intervals set by the team.

Lineage Tracker Components

A lineage tracker enables repeatable machine learning experiences. It allows for the re-creation of the ML environment at a certain moment in time, with all resources and environments reflecting the versions available at the time.

The ML lineage tracker records references to traceable data, ML models, and infrastructure resource changes. It includes the following components:

  • System architecture (infrastructure as code to solve environmental drift)
  • Data includes metadata, values, and features.
  • The model includes the method, features, parameters, and hyperparameters.
  • Code (implementation, modeling, pipeline)

The lineage tracker captures altered references during many iterations of the ML lifecycle stages. Alternative algorithms and feature lists are tested as part of the final production deployment. We can go back in time and recreate the release thanks to the data gathered by a lineage tracker. 

The components of the lineage tracker are as follows:

  • Infrastructure as code (IaC) – It’s essential for the automated modeling, provisioning, and management of cloud computing resources such as compute, storage, network, and application services. IaC prevents configuration drift through automation, boosting the speed and agility of infrastructure deployment. Also, IaC code modifications are committed to a version-controlled repository.
  • Data – Many teams keep data and metadata in data storage solutions such as a data lake, using data version control on top. The data’s location or link can be saved in a configuration file or on code version control media. Changes to any implementation code at any point in time can be traced in version control.
  • Model feature list – A feature store stores the details of the features as well as prior versions for any point-in-time updates. Changes to any model algorithm code at any point in time can be saved in version control.
  • Model container image – Versions of model container images that have changed at any point in time can be kept in container repositories maintained by the container registry.

Conclusion

The importance of machine learning architecture lies in its ability to create scalable, efficient, and maintainable machine learning systems. A well-thought-out architecture opens the door to improved machine learning algorithm performance, less time spent on deployment and maintenance, and less debugging.

A well-designed, field-specific architecture can ensure the integrity and safety of the ML infrastructure. Teams should do their best to build machine learning models with the correct information and in a way that allows for continual improvement, provided the architecture is appropriately structured.

Ultimately, ML architecture is critical because it helps us create strong, efficient, and scalable ML systems capable of meeting the demands of today’s data-driven organizations.

Git for Data – lakeFS

  • Get Started
    Get Started
  • Who’s coming to Data+AI Summit? Meet the lakeFS team at Booth #69! Learn more about -

    lakeFS for Databricks
    +