This checklist is organized into two main categories:
System concerns: Address quality attribute concerns that apply to the system as a whole, without requiring special considerations for ML components compared to traditional components.
Process concerns: Focus on the effective management and execution of the development and maintenance process for ML-enabled systems, as well as concerns specific to architecting such systems.
Each check in the checklist is described using the following fields:
ID: A unique identifier following a naming convention: the capital initial of the main category (S for System, P for Process) followed by the capital initial of the subcategory (e.g., A for Availability) and an integer. If ambiguities arise, a two-letter abbreviation is used (e.g., SMD for System–Modularity).
Name: A brief descriptive label summarizing the focus of the check.
Check: The specific question or statement to be considered during architectural decision-making.
ML Specificity: The degree to which the check is specific to ML-intensive systems, categorized as low (L), medium (M), or high (H).
| Name | ID | Check | ML Specificity |
|---|---|---|---|
| Data visualization | SU1 | Do you have data visualization techniques in place? | L |
| Visualization techniques | SU2 | Have you considered visualization techniques to identify or highlight relationships between data and computing tasks? | M |
| Data preparation | SDQ1 | Do you have strategies for data preparation and for making statistics on data? | H |
| Data cleaning | SDQ2 | Is your dataset clean, of good quality, and free from potential bias? | H |
| Dataset size | SDQ3 | Are you concerned about dataset size in ML processing? | H |
| Concept drift | SDQ4 | Do you engineer your ML-based system to adapt to input data changes (concept drift)? | H |
| System correctness | SC1 | Do you have techniques for ensuring system correctness? | H |
| Model validation | SMV1 | Are you performing validation of the model to predict how a learning algorithm will behave on new data? | H |
| Model validation | SMV2 | Are you combining model validation with data validation to detect data errors? | H |
| Independent upgradeability | SMD1 | Are you building a component-based distributed system where parts may need to be upgraded? | M |
| High Cohesion and low coupling | SMD2 | Are high cohesion and low coupling important? | L |
| Microservice | SMD3 | If you are interested in maintainability and modifiability, did you consider using a microservice architecture? | L |
| Discrete service | SMD4 | Can you decompose your system into discrete services? | L |
| Modeling intrinsic uncertainty | SMN1 | Can you explicitly model the intrinsic uncertainty of ML components and assess its impact at the design stage? | H |
| Time predictability | SMN2 | Do you have mechanisms for monitoring and post-analysis of time predictability? | H |
| Monitoring drift | SMN3 | Do you have tests that monitor changes in input distributions? | H |
| Continuous integration | SDE1 | Can you use continuous integration techniques for system development? | M |
| Infrastructure as code | SDE2 | Do you manage IT infrastructure like servers, databases, and networks through code for your ML system? | M |
| Blue/Green, Canary testing | SDE3 | Are you including Blue/Green or canary testing in your MLOps pipelines? | H |
| Failure recovery strategy | SA1 | Did you consider failure recovery strategies to avoid failure propagation? | L |
| Domain knowledge | SA2 | Do you have the required domain knowledge to handle availability decisions? | M |
| Layered/tiered architecture | SA3 | Can you split business logic from ML components using a layered/tiered architecture? | M |
| Uncertainty | SR1 | Do you have complete information on ML uncertainty at design time? | H |
| Fail safe | SS1 | Do you have techniques to quickly reach safe states when needed? | H |
| Safety evaluation | SS2 | Have you included evaluation processes for architectural safety choices? | L |
| Coding standards | SS3 | Do you use strict and certified coding standards for safety-critical ML components? | L |
| External certification | SS4 | Is your system safety-certified by external authorities? | L |
| Design to defend | SS5 | Are you designing your ML system to defend vulnerable code sections from cyberattacks? | H |
| Safety and fairness | SS6 | Do you ensure systematic fairness and safety in your ML system? | H |
| Data loss | SP1 | Do you handle data loss reduction and privacy preservation, e.g., using federated learning? | H |
| Name | ID | Check | ML Specificity |
|---|---|---|---|
| Documentation | PD1 | Do you have proper documentation or a plan to document your ML system? | M |
| Team | PT1 | Do you have heterogeneous teams mixing ML developers, data engineers, and architects? | D |
| Test-driven | PT2 | Do you have a test-driven development strategy for your QA and testing process? | M |
| Separate pipelines | PSP1 | Do you separate the branches for training pipelines from model training? | H |
| Model customization and reuse | SML1 | Do you have expertise to customize and reuse models? | H |
| Versioning | SML2 | Do you manage and version ML models? | H |
| ML infrastructure for deployment | SMI3 | Have you defined ML infrastructure and deployment processes? | H |
| Model testing | SML4 | Are you testing the quality and performance of your models? | H |