Official task alignment for this domain:
| Official MLA-C01 task | How this document covers it |
|---|---|
| Task 2.1: Choose a modeling approach | Managed AI services, SageMaker built-in algorithms, script mode, JumpStart, Bedrock, foundation models, cost and interpretability tradeoffs |
| Task 2.2: Train and refine models | Epochs, batch size, early stopping, distributed training, regularization, hyperparameter tuning, model size reduction, Model Registry |
| Task 2.3: Analyze model performance | Classification and regression metrics, baselines, overfitting, underfitting, Clarify, Debugger, shadow variant comparison |
High-frequency model selection memory:
| Scenario clue | Strong first choice | Common distractor |
|---|---|---|
| Standard text, speech, image, translation, or document extraction with little labeled data | AWS managed AI service | Custom SageMaker model from scratch |
| Proprietary labels and custom supervised objective | SageMaker training or built-in algorithm/script mode | Generic managed AI API |
| Need foundation model, embeddings, or generative workflow | Amazon Bedrock or SageMaker JumpStart where appropriate | Traditional tabular algorithm |
| Validation loss worsens while training loss improves | Regularization, early stopping, or simpler model | Train longer without validation |
| Need reproducible approval and audit trail | SageMaker Model Registry | Direct notebook deployment |
Core Priority: MLA-C01 expects candidates to choose between building a model, using a SageMaker built-in algorithm, selecting a JumpStart solution, calling an AWS AI service, or using a foundation model through Amazon Bedrock.
High Frequency: Scenarios compare problem type, available data, interpretability, cost, latency, and operational complexity. The exam rewards selecting the simplest service that satisfies the business and technical requirement.
Confusion Alert: A common wrong answer is building a custom SageMaker model when Amazon Transcribe, Rekognition, Translate, Comprehend, or Bedrock already matches the need. The opposite trap is using a prebuilt AI service when the scenario requires custom supervised learning on proprietary labels.
Scenario Logic: Start with the task: classification, regression, forecasting, translation, transcription, image analysis, generative text, embedding, or recommendation. Then evaluate data availability, customization requirement, and control requirements.
Version Delta: Bedrock model availability and SageMaker JumpStart templates change. Use official model catalogs and service documentation for exact current model lists.
Failure Trigger: Wrong approach selection causes insufficient accuracy, excessive development time, poor explainability, unexpected cost, or inability to meet latency and governance constraints.
Operational Dependency: The approach depends on labeled data, task fit, model ownership, deployment path, interpretability, and compliance constraints.
How the Exam Asks It: The stem may say the team lacks ML training data, needs rapid deployment, requires a custom model, or must explain feature impact.
How Distractors Are Designed: Distractors favor fashionable services without matching the task, or choose low-level training when a managed AI service solves the requirement.
Why the Correct Answer Works: The correct answer maps the workload to the least complex model path that satisfies customization and operational constraints.
High-Value Exam Focus: Start with task type and data availability. No labeled dataset plus a standard AWS capability usually points to a managed AI service; proprietary labels or custom objective points to SageMaker training; GenAI, embedding, and foundation-model workflows point to Bedrock or JumpStart depending on control requirements.
Practice Question: A company needs to extract text from scanned invoices and has no labeled dataset. They want a managed service with minimal model training. Which approach is best?
A. Train a custom neural network in SageMaker script mode from scratch.
B. Use Amazon Textract and integrate its output into the workflow.
C. Use SageMaker automatic model tuning on an XGBoost model.
D. Deploy a multi-model endpoint before selecting an extraction model.
Correct Answer: B
Explanation: B matches the managed document extraction requirement without labeled training data. A and C require training data and solve a different problem. D is a deployment pattern, not an extraction capability.
Exam Takeaway: Choose managed AI services for standard perception or language tasks when customization is not the main requirement; custom training distractors are common.
Modeling approach selection is an engineering tradeoff. SageMaker built-in algorithms are appropriate when the team has structured data and a conventional supervised or unsupervised problem. Script mode is useful when the team needs framework-level control with TensorFlow, PyTorch, or another supported library. JumpStart and Bedrock reduce build effort when foundation models or solution templates fit the task. AWS AI services are best when the task is a standard capability with managed APIs.
The first dependency is data. If labeled examples do not exist, supervised training is not immediately feasible. The second dependency is task specificity. If the output can be produced by a managed service, building a custom model increases operational burden. The third dependency is interpretability and governance; some regulated tasks require model choice and evaluation evidence that a black-box API may not provide.
| Object | Attribute | Value Range | Default State | Dependency | Failure State |
|---|---|---|---|---|---|
| Problem type | Learning objective | Classification, regression, clustering, forecasting, generation | Undefined until requirements analysis | Labels, target variable, success metric | Wrong algorithm family |
| SageMaker built-in algorithm | Task compatibility | XGBoost, linear learner, image/text algorithms, others | Not selected | Data format and objective match | Training job runs but metrics are irrelevant |
| AWS AI service | Managed capability | Translate, Transcribe, Rekognition, Textract, Comprehend, Bedrock | API not integrated | Input type and service quota | Overbuilt custom model or unsupported output |
| Interpretability constraint | Explanation need | Low, medium, high, regulated | Often unstated | Model type and audit evidence | Model rejected by stakeholders |
| Cost boundary | Runtime and training budget | API usage, training compute, endpoint hosting | Unbounded if not modeled | Volume, latency, and instance/service choice | Budget overrun |
Convert the business statement into a prediction or automation task. This prevents selecting a service by name recognition alone.
Validate whether labeled data exists and whether it represents the production population.
Compare managed AI service fit before custom training. If a service output directly satisfies the requirement, it usually reduces operational complexity.
Inspect available SageMaker or Bedrock options through supported console catalogs or APIs.
#Version-aware AWS CLI/API verification pattern; exact commands vary by service and region.
aws sagemaker list-algorithms
aws bedrock list-foundation-models
Expected state: the candidate model or service is available in the target region and supports the needed modality.
The requirement defines the output type. That output type constrains the model family or managed service. Data availability then determines whether training is possible. Control and interpretability requirements determine whether a managed API, built-in algorithm, script-mode model, JumpStart asset, or Bedrock model is acceptable. If the wrong layer is chosen, later pipeline stages inherit the mismatch: metrics measure the wrong outcome, deployment hosts the wrong artifact, or governance cannot approve the result.
| Task | Precise Command or Path | Verification Standard |
|---|---|---|
| Inspect SageMaker algorithm options | aws sagemaker list-algorithms |
Candidate algorithm exists and supports the target task |
| Inspect Bedrock model availability | aws bedrock list-foundation-models |
Required model family is available in the target region |
| Validate AI service fit | AWS console/API documentation for Textract, Transcribe, Rekognition, Translate, Comprehend | Input modality and output fields match the scenario |
| Confirm dataset label readiness | Training data manifest or catalog profile | Target label exists and distribution is usable |
Core Priority: Model development questions frequently test training mechanics: epoch, batch size, steps, distributed training, early stopping, regularization, hyperparameter tuning, model size reduction, and version management.
High Frequency: SageMaker training jobs, script mode, automatic model tuning, Model Registry, and framework containers are common anchors.
Confusion Alert: Distractors may tune hyperparameters before fixing underfitting or overfitting evidence, or approve a model without registering artifacts and metrics for repeatability.
Scenario Logic: Determine whether the model is underfitting, overfitting, slow to train, too large, or hard to reproduce. Each symptom has a different operational response.
Version Delta: Container images, instance families, and AMT configuration fields evolve. Validate production commands with the current SageMaker SDK or API reference.
Failure Trigger: Training can fail or degrade from wrong input channels, unsupported container/framework versions, excessive learning rate, no early stopping, insufficient regularization, or missing model artifact lineage.
Operational Dependency: Repeatable training requires code version, data version, hyperparameters, container image, metrics, artifacts, and registry state.
How the Exam Asks It: The stem may describe validation loss increasing while training loss improves, expensive long-running training jobs, or the need to reproduce a previous model.
How Distractors Are Designed: Wrong choices often change endpoint deployment before model training evidence is fixed, or confuse data bias with hyperparameter tuning.
Why the Correct Answer Works: The correct answer addresses the observed training symptom and preserves auditability.
High-Value Exam Focus: Read the metric pattern before choosing a fix. Overfitting points to regularization, early stopping, feature selection, or less complexity; underfitting points to better features, model capacity, or training configuration; repeatability points to Model Registry and captured training metadata.
Practice Question: During training, training loss decreases but validation loss increases after several epochs. The team wants to improve generalization. Which action is most appropriate?
A. Add regularization or early stopping and compare validation metrics.
B. Increase endpoint invocations per instance.
C. Approve the model package without changing training.
D. Disable validation and train for more epochs.
Correct Answer: A
Explanation: A targets overfitting by constraining the model or stopping when validation performance worsens. B changes inference capacity. C ignores a known training problem. D hides the evidence and can worsen overfitting.
Exam Takeaway: Match the remediation to the metric pattern; overfitting distractors often focus on deployment or more training time.
Training is a controlled execution of code, data, hyperparameters, and compute. Epoch count determines passes over data. Batch size affects memory, gradient stability, and throughput. Learning rate changes the size of parameter updates. Regularization constrains model complexity. Early stopping stops training when validation performance no longer improves. AMT explores hyperparameter combinations and records objective metrics.
Model versioning closes the loop. A model artifact without training job metadata cannot be reliably reproduced. SageMaker Model Registry packages the artifact, approval state, metrics, and lineage so deployment pipelines can reference a controlled version.
| Object | Attribute | Value Range | Default State | Dependency | Failure State |
|---|---|---|---|---|---|
| Training job | Input channel | S3 URI, file system, pipe mode where supported | Undefined until job config | Data format and role permissions | Channel read failure |
| Hyperparameter tuning job | Objective metric | Validation accuracy, F1, RMSE, loss, custom metric | No tuning unless configured | Training script emits metrics | Search optimizes wrong target |
| Regularization setting | Constraint type | L1, L2, dropout, pruning, feature selection | Algorithm-specific default | Model family support | Overfitting persists |
| Training container | Framework image | SageMaker built-in or custom ECR image | Latest not implied | Region, framework version, dependencies | Runtime import or compatibility errors |
| Model package | Approval status | Pending, approved, rejected | Pending manual approval | Evaluation metrics and governance rule | Uncontrolled deployment |
Read the metric pattern first: training loss, validation loss, objective metric, convergence logs, and runtime cost. This identifies whether the issue is model fit, speed, size, or reproducibility.
Verify the training job configuration and status.
#Official AWS CLI verification pattern.
aws sagemaker describe-training-job --training-job-name example-training-job
Expected state: input channels, image, role, instance type, hyperparameters, and output artifacts match the intended experiment.
#Official AWS CLI verification pattern.
aws sagemaker describe-hyper-parameter-tuning-job --hyper-parameter-tuning-job-name example-hpo-job
Expected state: objective metric aligns to the scenario and best training job has completed.
#Official AWS CLI verification pattern.
aws sagemaker describe-model-package --model-package-name example-model-package-arn
Expected state: model artifact, metrics, and approval status are present.
The training job pulls source data through input channels, starts the container, applies hyperparameters inside the algorithm or script, and emits metrics and artifacts. The tuning service launches multiple training jobs, compares objective metrics, and identifies the best configuration. The model registry records the selected artifact and governance state. If the objective metric is wrong, the tuning service optimizes the wrong behavior. If the artifact is not registered, deployment lacks a controlled model identity.
| Task | Precise Command or Path | Verification Standard |
|---|---|---|
| Inspect training job | aws sagemaker describe-training-job --training-job-name example-training-job |
Job completed and configuration matches intended data, image, and hyperparameters |
| Inspect tuning objective | aws sagemaker describe-hyper-parameter-tuning-job --hyper-parameter-tuning-job-name example-hpo-job |
Objective metric and best job match evaluation requirement |
| Validate model package | aws sagemaker describe-model-package --model-package-name example-model-package-arn |
Artifact, metrics, and approval state are present |
| Review training logs | CloudWatch Logs > /aws/sagemaker/TrainingJobs |
Logs show convergence, metric emission, and no data-read errors |
Core Priority: MLA-C01 requires selecting and interpreting metrics. Classification, regression, bias analysis, convergence debugging, and production variant comparison are all testable.
High Frequency: Confusion matrix, precision, recall, F1, accuracy, RMSE, ROC/AUC, SageMaker Clarify, SageMaker Debugger, baselines, and shadow variants appear frequently.
Confusion Alert: Accuracy can be a distractor for imbalanced classification. RMSE is wrong for classification decisions. Endpoint latency metrics do not prove model correctness.
Scenario Logic: Match metric to business cost: false positives, false negatives, numeric error, ranking quality, bias, or convergence behavior.
Version Delta: SageMaker monitoring and debugging capabilities evolve. Validate feature availability in the active region and SDK version.
Failure Trigger: Evaluation failures occur when the wrong metric is optimized, the test set leaks training data, the model is biased, the baseline is missing, or a shadow model is compared on infrastructure metrics only.
Operational Dependency: Evaluation depends on held-out data, metric definition, baseline artifact, prediction logs, and model variant routing.
How the Exam Asks It: Stems may describe fraud detection, medical false negatives, price prediction, a shadow deployment, or a model that fails to converge.
How Distractors Are Designed: Wrong answers optimize a convenient metric instead of the risk-aligned metric, or monitor CPU when prediction quality is the issue.
Why the Correct Answer Works: The correct answer measures the outcome the scenario cares about and uses AWS tooling to observe model behavior.
High-Value Exam Focus: Select metrics from the cost of error. Fraud, safety, and rare positives often need recall/F1 thinking; numeric prediction needs RMSE/MAE-style thinking; production candidate comparison needs variant-level evidence, not only training metrics.
Practice Question: A fraud model has 98% accuracy because fraudulent cases are rare, but it misses many fraud events. Which metric should the team prioritize?
A. Recall for the fraud class, balanced with precision or F1.
B. Endpoint CPU utilization.
C. Total number of approved model packages.
D. S3 object count in the training bucket.
Correct Answer: A
Explanation: A focuses on missed positives in an imbalanced dataset. B, C, and D are operational signals but do not measure fraud detection performance.
Exam Takeaway: Metric selection follows the cost of error; imbalanced-class distractors often hide behind high accuracy.
Evaluation converts predictions into evidence. Classification metrics use true positives, false positives, true negatives, and false negatives. Precision answers how many predicted positives were correct. Recall answers how many actual positives were found. F1 balances precision and recall. Regression metrics such as RMSE quantify numeric error. ROC/AUC describes ranking separation across thresholds.
Bias and explainability tools add another layer. SageMaker Clarify can evaluate feature attributions and bias metrics, while Debugger can inspect training behavior and convergence signals. Shadow variants allow a candidate model to receive production traffic copies without serving responses, enabling comparison against the production variant.
| Object | Attribute | Value Range | Default State | Dependency | Failure State |
|---|---|---|---|---|---|
| Confusion matrix | Error counts | TP, FP, TN, FN | Not computed unless labels and predictions are compared | Labeled test data | Misread model risk |
| Regression metric | Error measure | RMSE, MAE, MAPE where appropriate | Undefined until objective chosen | Numeric target | Incorrect model comparison |
| Clarify report | Bias/explainability output | Pre-training, post-training, feature attribution | Absent unless job configured | Dataset facets and model endpoint/artifact | No fairness or explanation evidence |
| Debugger output | Training tensor and rule evidence | Rule status, tensors, logs | Disabled unless configured | Framework support and hook configuration | Convergence issue remains hidden |
| Shadow variant | Traffic comparison mode | Production and shadow variant metrics | No shadow traffic | Endpoint config and capture/metrics setup | Candidate model cannot be compared safely |
Identify the business error cost before selecting metrics. This prevents choosing accuracy when recall, precision, F1, or RMSE is required.
Validate the evaluation dataset and baseline. The dataset must be held out and representative.
Inspect model evaluation artifacts in the training output, model package, or experiment tracking record.
#Official AWS CLI verification pattern.
aws sagemaker describe-model-package --model-package-name example-model-package-arn
Expected state: metrics or model quality artifacts are attached for review.
#Official AWS CLI verification pattern.
aws sagemaker describe-endpoint --endpoint-name example-endpoint
Expected state: production and shadow variant configuration matches the comparison plan.
The model emits predictions for labeled evaluation records. The evaluator compares predictions with labels and computes metrics. Clarify or Debugger jobs add bias, attribution, or convergence evidence. When deployed as a shadow variant, the endpoint routes copied traffic to the candidate model while the production model serves users. Metrics and captured outputs then reveal whether the candidate performs better under realistic traffic without taking over responses.
| Task | Precise Command or Path | Verification Standard |
|---|---|---|
| Inspect model metrics package | aws sagemaker describe-model-package --model-package-name example-model-package-arn |
Evaluation metrics are attached and match the selected business objective |
| Verify endpoint variant setup | aws sagemaker describe-endpoint --endpoint-name example-endpoint |
Variant configuration reflects production and candidate comparison intent |
| Review endpoint metrics | CloudWatch Metrics > AWS/SageMaker > EndpointName, VariantName | Candidate and production metrics are visible by variant |
| Review convergence evidence | CloudWatch Logs or SageMaker Debugger outputs | Training logs or Debugger rules show convergence status |
How should a team choose between built-in SageMaker algorithms, custom containers, AWS AI services, and foundation models?
Match the approach to the problem type, customization need, data availability, latency requirements, and operational ownership the team can support.
Built-in algorithms are useful when the problem maps cleanly to supported supervised or unsupervised patterns. Custom containers fit specialized frameworks or libraries. AWS AI services can be best when a managed capability already solves the use case. Foundation models are appropriate when generative or language capabilities are central and the team can manage prompt, evaluation, and governance requirements.
Demand Score: 88
Exam Relevance Score: 94
What is the purpose of hyperparameter tuning in SageMaker model development?
Hyperparameter tuning searches over configured parameter ranges to find training settings that improve a selected objective metric.
Hyperparameters such as learning rate, tree depth, batch size, or regularization strength control training behavior but are not learned directly from the data. SageMaker automatic model tuning can run multiple training jobs and compare metrics. The exam commonly tests whether tuning is appropriate after data quality and algorithm choice have been addressed.
Demand Score: 87
Exam Relevance Score: 94
Why should a team use validation and test datasets instead of evaluating only on the training data?
Separate validation and test datasets help estimate generalization and reveal overfitting before a model is promoted.
A model can memorize training examples and still perform poorly on future data. Validation data supports model selection and tuning, while test data provides a cleaner final estimate of performance. MLA-C01 scenarios often include suspiciously high training accuracy, which should trigger concern about overfitting, leakage, or an invalid split strategy.
Demand Score: 86
Exam Relevance Score: 93
When should model versioning and the SageMaker Model Registry be used?
Use model versioning and Model Registry when models need tracked lineage, approval status, reproducible deployment, and controlled promotion across environments.
Production ML needs more than a model artifact in S3. Teams must know which training data, image, parameters, metrics, and approval decision produced a deployed model. The registry supports governance and promotion workflows, especially when CI/CD or multiple deployment stages are involved.
Demand Score: 90
Exam Relevance Score: 95
What does a shadow variant help evaluate during model development and release?
A shadow variant lets a candidate model receive copied production traffic for evaluation without serving responses to users.
Shadow testing is useful when the team wants real production input patterns but does not want the candidate model to affect customer outcomes. It can reveal latency, error, and prediction behavior before live traffic is shifted. This differs from A/B testing, where multiple variants actively serve user traffic.
Demand Score: 89
Exam Relevance Score: 95