The Checklist

This checklist is organized into two main categories:

System concerns: Address quality attribute concerns that apply to the system as a whole, without requiring special considerations for ML components compared to traditional components.

Process concerns: Focus on the effective management and execution of the development and maintenance process for ML-enabled systems, as well as concerns specific to architecting such systems.

Each check in the checklist is described using the following fields:

ID: A unique identifier following a naming convention: the capital initial of the main category (S for System, P for Process) followed by the capital initial of the subcategory (e.g., A for Availability) and an integer. If ambiguities arise, a two-letter abbreviation is used (e.g., SMD for System–Modularity).

Name: A brief descriptive label summarizing the focus of the check.

Check: The specific question or statement to be considered during architectural decision-making.

ML Specificity: The degree to which the check is specific to ML-intensive systems, categorized as low (L), medium (M), or high (H).

Download the Checklist

System

Name	ID	Check	ML Specificity
Data visualization	SU1	Do you have data visualization techniques in place?	L
Visualization techniques	SU2	Have you considered visualization techniques to identify or highlight relationships between data and computing tasks?	M
Data preparation	SDQ1	Do you have strategies for data preparation and for making statistics on data?	H
Data cleaning	SDQ2	Is your dataset clean, of good quality, and free from potential bias?	H
Dataset size	SDQ3	Are you concerned about dataset size in ML processing?	H
Concept drift	SDQ4	Do you engineer your ML-based system to adapt to input data changes (concept drift)?	H
System correctness	SC1	Do you have techniques for ensuring system correctness?	H
Model validation	SMV1	Are you performing validation of the model to predict how a learning algorithm will behave on new data?	H
Model validation	SMV2	Are you combining model validation with data validation to detect data errors?	H
Independent upgradeability	SMD1	Are you building a component-based distributed system where parts may need to be upgraded?	M
High Cohesion and low coupling	SMD2	Are high cohesion and low coupling important?	L
Microservice	SMD3	If you are interested in maintainability and modifiability, did you consider using a microservice architecture?	L
Discrete service	SMD4	Can you decompose your system into discrete services?	L
Modeling intrinsic uncertainty	SMN1	Can you explicitly model the intrinsic uncertainty of ML components and assess its impact at the design stage?	H
Time predictability	SMN2	Do you have mechanisms for monitoring and post-analysis of time predictability?	H
Monitoring drift	SMN3	Do you have tests that monitor changes in input distributions?	H
Continuous integration	SDE1	Can you use continuous integration techniques for system development?	M
Infrastructure as code	SDE2	Do you manage IT infrastructure like servers, databases, and networks through code for your ML system?	M
Blue/Green, Canary testing	SDE3	Are you including Blue/Green or canary testing in your MLOps pipelines?	H
Failure recovery strategy	SA1	Did you consider failure recovery strategies to avoid failure propagation?	L
Domain knowledge	SA2	Do you have the required domain knowledge to handle availability decisions?	M
Layered/tiered architecture	SA3	Can you split business logic from ML components using a layered/tiered architecture?	M
Uncertainty	SR1	Do you have complete information on ML uncertainty at design time?	H
Fail safe	SS1	Do you have techniques to quickly reach safe states when needed?	H
Safety evaluation	SS2	Have you included evaluation processes for architectural safety choices?	L
Coding standards	SS3	Do you use strict and certified coding standards for safety-critical ML components?	L
External certification	SS4	Is your system safety-certified by external authorities?	L
Design to defend	SS5	Are you designing your ML system to defend vulnerable code sections from cyberattacks?	H
Safety and fairness	SS6	Do you ensure systematic fairness and safety in your ML system?	H
Data loss	SP1	Do you handle data loss reduction and privacy preservation, e.g., using federated learning?	H

Process

Name	ID	Check	ML Specificity
Documentation	PD1	Do you have proper documentation or a plan to document your ML system?	M
Team	PT1	Do you have heterogeneous teams mixing ML developers, data engineers, and architects?	D
Test-driven	PT2	Do you have a test-driven development strategy for your QA and testing process?	M
Separate pipelines	PSP1	Do you separate the branches for training pipelines from model training?	H
Model customization and reuse	SML1	Do you have expertise to customize and reuse models?	H
Versioning	SML2	Do you manage and version ML models?	H
ML infrastructure for deployment	SMI3	Have you defined ML infrastructure and deployment processes?	H
Model testing	SML4	Are you testing the quality and performance of your models?	H