Official task alignment for this domain:
| Official MLA-C01 task | How this document covers it |
|---|---|
| Task 3.1: Select deployment infrastructure based on existing architecture and requirements | Real-time, serverless, asynchronous, batch, multi-model, multi-container, ECS, EKS, Lambda, edge, CPU/GPU, latency and cost tradeoffs |
| Task 3.2: Create and script infrastructure based on existing architecture and requirements | CloudFormation, CDK, ECR, BYOC, VPC endpoints, endpoint auto scaling, scaling metrics, maintainable infrastructure |
| Task 3.3: Use automated orchestration tools to set up CI/CD pipelines | SageMaker Pipelines, CodePipeline, CodeBuild, CodeDeploy, EventBridge, MWAA/Airflow, Git workflows, tests, retraining, rollback |
High-frequency deployment selection memory:
| Scenario clue | Strong first choice | Common distractor |
|---|---|---|
| Nightly or offline scoring of a large dataset | Batch transform | Always-on real-time endpoint |
| Low-latency synchronous request/response | Real-time endpoint | Batch transform |
| Large payload or long processing with delayed response | Asynchronous inference | Serverless endpoint without checking limits |
| Intermittent traffic and supported payload/runtime limits | Serverless inference | Provisioned endpoint for very low utilization |
| Many similar tenant or segment models | Multi-model endpoint | One endpoint per tiny model without cost analysis |
Core Priority: Deployment questions test whether candidates match inference patterns to SageMaker endpoints, batch transform, serverless inference, asynchronous inference, multi-model endpoints, ECS, EKS, Lambda, or edge optimization.
High Frequency: Latency, payload size, traffic variability, cost, GPU/CPU need, container control, and model count drive the answer.
Confusion Alert: Real-time endpoints are not always correct. Batch workloads should not pay for always-on endpoints. Serverless is not a fit for every latency, payload, or runtime requirement.
Scenario Logic: Start with invocation pattern: synchronous low latency, bursty intermittent, large payload asynchronous, offline batch scoring, many similar models, custom orchestrator, or edge device.
Version Delta: SageMaker endpoint options and quotas change. Validate current limits for payload size, timeout, memory, and concurrency before production design.
Failure Trigger: Wrong deployment target causes timeout, unnecessary cost, cold-start latency, GPU shortage, container incompatibility, or inability to rollback.
Operational Dependency: Deployment depends on model artifact, container image, endpoint config, instance or serverless configuration, IAM role, VPC path, and monitoring.
How the Exam Asks It: The stem may mention unpredictable traffic, nightly scoring, many tenant models, large image payloads, or strict low-latency response.
How Distractors Are Designed: Wrong answers choose the most familiar endpoint type while ignoring workload timing and latency.
Why the Correct Answer Works: The correct target satisfies the request/response pattern and cost envelope.
High-Value Exam Focus: Deployment target follows invocation pattern first: real-time for synchronous low latency, batch transform for offline scoring, asynchronous for long-running or large-payload inference, serverless for intermittent supported workloads, and multi-model for many similar models.
Practice Question: A model scores 20 million historical records once each night. There is no need for online responses. Which deployment pattern is most appropriate?
A. SageMaker batch transform.
B. Always-on real-time endpoint.
C. Serverless endpoint for every record.
D. Multi-model endpoint with one model.
Correct Answer: A
Explanation: A fits offline batch scoring. B pays for always-on hosting without online need. C is designed for synchronous invocation patterns, not large scheduled batch scoring. D solves hosting many models, not nightly scoring.
Exam Takeaway: Pick deployment from invocation pattern first; endpoint distractors are common when the scenario is actually batch.
Deployment maps a trained artifact to an invocation contract. Real-time endpoints serve synchronous low-latency requests. Serverless endpoints reduce idle cost for intermittent traffic but introduce limits and cold-start considerations. Asynchronous inference decouples request submission from response retrieval for larger payloads or longer processing. Batch transform scores offline datasets. Multi-model endpoints host many models behind one endpoint when models share a serving container pattern.
Choosing ECS, EKS, or Lambda can be valid when the scenario requires custom application orchestration, container control, or event-driven glue code, but those choices add operational ownership. SageMaker endpoints provide ML-specific hosting, variant routing, and integration with SageMaker monitoring.
| Object | Attribute | Value Range | Default State | Dependency | Failure State |
|---|---|---|---|---|---|
| Endpoint type | Invocation mode | Real-time, serverless, asynchronous, multi-model | No hosting until endpoint config exists | Latency, payload, traffic pattern | Timeout or excessive cost |
| Batch transform job | Input data path | S3 prefix or manifest | Not scheduled | Model artifact and transform resources | Offline scoring fails |
| Endpoint variant | Traffic weight | 0-100 percent per variant | Single variant if unmanaged | Endpoint config and model objects | Wrong model serves traffic |
| Container image | Serving runtime | SageMaker image or custom ECR image | Undefined | Inference handler and dependencies | Model cannot load or invoke |
| Compute selection | Instance/serverless config | CPU, GPU, memory, concurrency | Unset until config | Model size, latency, cost | Capacity or cold-start issue |
Classify invocation timing and latency. This determines the deployment family before infrastructure details.
Verify the model artifact and container image selected for hosting.
#Official AWS CLI verification pattern.
aws sagemaker describe-model --model-name example-model
Expected state: model data URL, image, role, and environment match the deployment plan.
#Official AWS CLI verification pattern.
aws sagemaker describe-endpoint-config --endpoint-config-name example-endpoint-config
Expected state: production variants, instance/serverless options, and model references align to traffic needs.
#Official AWS CLI verification pattern.
aws sagemaker describe-transform-job --transform-job-name nightly-scoring-job
Expected state: job completed and output S3 path contains predictions.
SageMaker creates a model object that binds artifact, container image, execution role, and optional VPC settings. Endpoint config maps the model to compute and variant routing, while batch transform maps the model to an offline input/output job. The invocation path then either waits synchronously, stores asynchronous results, or writes batch outputs. If the deployment family mismatches the traffic pattern, the system fails through latency, timeout, cost, or capacity pressure.
| Task | Precise Command or Path | Verification Standard |
|---|---|---|
| Inspect model object | aws sagemaker describe-model --model-name example-model |
Artifact, image, and role match intended deployment |
| Inspect endpoint config | aws sagemaker describe-endpoint-config --endpoint-config-name example-endpoint-config |
Variant and compute settings fit latency and traffic pattern |
| Validate endpoint status | aws sagemaker describe-endpoint --endpoint-name example-endpoint |
Endpoint status is InService before production traffic |
| Validate batch scoring | aws sagemaker describe-transform-job --transform-job-name nightly-scoring-job |
Job completed and output S3 location is populated |
Core Priority: MLA-C01 tests whether candidates can provision maintainable ML infrastructure with CloudFormation, AWS CDK, containers, ECR, SageMaker endpoints, VPC configuration, and auto scaling.
High Frequency: Questions involve on-demand versus provisioned resources, endpoint auto scaling metrics, BYOC containers, VPC subnets/security groups, and stack communication.
Confusion Alert: Distractors may manually create resources when the scenario asks for repeatable environments. Another trap is scaling on a generic metric when invocations per instance or latency is the actual endpoint pressure signal.
Scenario Logic: Determine whether the requirement is repeatability, isolation, scaling, container dependency control, or cost optimization. Then choose IaC, VPC settings, ECR/BYOC, and scaling policies accordingly.
Version Delta: Auto scaling metric names and service quotas can change. Confirm current SageMaker endpoint scaling documentation before production use.
Failure Trigger: Deployment fails when the endpoint cannot pull the image, subnets lack required network path, security groups block dependencies, scaling policy targets the wrong metric, or IaC stacks drift.
Operational Dependency: ML infrastructure depends on model artifact, ECR image, execution role, VPC route, endpoint configuration, scalable target, scaling policy, and stack outputs.
How the Exam Asks It: The stem may mention repeated dev/test/prod environments, private model hosting, custom inference dependencies, or traffic spikes.
How Distractors Are Designed: Wrong choices skip IaC, use public endpoints despite isolation requirements, or scale training jobs instead of endpoint variants.
Why the Correct Answer Works: The correct answer provisions the controllable infrastructure object and verifies the dependency that owns the behavior.
High-Value Exam Focus: IaC questions reward repeatability, VPC questions reward network-path dependency checks, BYOC questions reward ECR image identity, and auto scaling questions reward endpoint-variant metrics such as invocations per instance or latency-related signals.
Practice Question: A SageMaker endpoint has rising latency during traffic spikes. The team wants automatic capacity changes based on endpoint load. Which metric is most directly aligned?
A. InvocationsPerInstance for the endpoint variant.
B. Number of objects in the training S3 bucket.
C. Total model packages in the registry.
D. Number of CodeCommit branches.
Correct Answer: A
Explanation: A measures per-instance endpoint load and is commonly aligned with SageMaker endpoint scaling. B, C, and D do not observe serving pressure.
Exam Takeaway: Auto scaling decisions must inspect the serving resource; unrelated storage, registry, or repository counts are distractors.
Infrastructure as Code records the desired state of ML resources so environments can be recreated and reviewed. CloudFormation and CDK define roles, buckets, VPC resources, endpoint configs, scaling targets, and pipelines. Containers package inference code and dependencies. ECR stores the image SageMaker or container services pull at runtime.
VPC isolation changes the network dependency. A private endpoint configuration must have subnets, security groups, route tables, and endpoints or NAT paths that let the service reach required resources such as S3, ECR, CloudWatch, and KMS. Auto scaling attaches to the endpoint variant as a scalable target and changes capacity based on selected metrics.
| Object | Attribute | Value Range | Default State | Dependency | Failure State |
|---|---|---|---|---|---|
| IaC stack | Resource definition | CloudFormation template or CDK app | Manual drift if unmanaged | Parameter values and stack permissions | Environment mismatch |
| ECR image | Image digest | SHA256 digest or tag reference | Tag can move unless pinned | Build pipeline and repository policy | Wrong runtime dependencies |
| VPC endpoint config | Subnets and security groups | Private subnets, controlled ingress/egress | Public path if VPC not configured | Routes to S3/ECR/KMS/CloudWatch | Image pull or data access failure |
| Scalable target | Min/max capacity | Endpoint variant capacity bounds | Fixed capacity | Application Auto Scaling registration | No automatic response to traffic |
| Scaling policy | Metric target | InvocationsPerInstance, latency-related custom metric, scheduled policy | None | CloudWatch metrics and target tracking | Over/under scaling |
Identify whether the scenario requires repeatability, isolation, custom dependencies, or automatic capacity. Each requirement maps to a different control object.
Inspect IaC stack status before debugging individual resources.
#Official AWS CLI verification pattern.
aws cloudformation describe-stacks --stack-name mla-prod-endpoint-stack
Expected state: stack is complete and outputs provide model, endpoint, subnet, or role identifiers.
#Official AWS CLI verification pattern.
aws ecr describe-images --repository-name mla-inference --image-ids imageTag=prod
Expected state: image digest matches the approved build.
#Official AWS CLI verification pattern.
aws sagemaker describe-model --model-name example-model
aws application-autoscaling describe-scalable-targets --service-namespace sagemaker
Expected state: VPC config and scalable target exist for the endpoint variant when required.
The IaC stack creates roles, networking, repository references, model objects, endpoint configs, and scaling resources. SageMaker pulls the inference image from ECR, loads the model artifact, and attaches the endpoint to selected networking. Application Auto Scaling reads CloudWatch metrics for the endpoint variant and changes capacity within min/max bounds. If the image digest is wrong, the runtime fails. If VPC dependencies are blocked, the endpoint cannot reach data or logs. If the metric is misaligned, scaling reacts too late or not at all.
| Task | Precise Command or Path | Verification Standard |
|---|---|---|
| Inspect IaC stack | aws cloudformation describe-stacks --stack-name mla-prod-endpoint-stack |
Stack status is complete and outputs match deployed resources |
| Verify image digest | aws ecr describe-images --repository-name mla-inference --image-ids imageTag=prod |
Digest matches approved build artifact |
| Inspect model VPC config | aws sagemaker describe-model --model-name example-model |
VPC configuration exists when private hosting is required |
| Inspect scalable target | aws application-autoscaling describe-scalable-targets --service-namespace sagemaker |
Endpoint variant is registered with correct min/max capacity |
Core Priority: ML workflow orchestration combines data processing, training, evaluation, registration, approval, deployment, testing, and retraining triggers.
High Frequency: SageMaker Pipelines, CodePipeline, CodeBuild, CodeDeploy, EventBridge, Step Functions, Amazon MWAA, Git workflows, automated tests, and rollback strategies appear in deployment domain questions.
Confusion Alert: A common trap is using CI/CD tools only for application code while leaving data, model, and evaluation gates manual. Another is retraining without a trigger or approval condition.
Scenario Logic: Determine whether the requirement is ML pipeline reproducibility, source-triggered build, scheduled retraining, event-driven response, or deployment rollback.
Version Delta: Pipeline integrations and service quotas change. Verify current service support for the target region and account.
Failure Trigger: Workflow failures happen from missing source artifacts, broken IAM roles, failed tests, unapproved model packages, incorrect EventBridge rule patterns, or rollback policies that do not target the serving endpoint.
Operational Dependency: CI/CD depends on repository events, buildspec, pipeline stages, IAM role permissions, artifact stores, model registry state, and deployment target health.
How the Exam Asks It: Stems may include a commit that should trigger retraining, a scheduled data refresh, a model approval gate, or a failed canary deployment needing rollback.
How Distractors Are Designed: Wrong options use one service to solve every step, skip automated tests, or deploy directly from a notebook.
Why the Correct Answer Works: The correct answer wires the event, build, ML pipeline, approval, and deployment controls in the right sequence.
High-Value Exam Focus: CI/CD answers should preserve state transitions: source change or schedule, build/test, processing/training, evaluation condition, registry approval, deployment, health check, and rollback. Direct notebook deployment is usually a governance distractor.
Practice Question: A team wants every approved model package to trigger a deployment pipeline, but only after evaluation metrics pass and manual approval is complete. Which design best fits?
A. Use SageMaker Pipelines to evaluate/register the model and EventBridge/CodePipeline to trigger deployment on approved model package state.
B. Deploy directly from a notebook after training completes.
C. Use S3 Transfer Acceleration to approve models faster.
D. Increase the training instance size and skip registry approval.
Correct Answer: A
Explanation: A connects evaluation, registry approval, event trigger, and deployment automation. B is manual and unaudited. C affects S3 transfer, not approval workflow. D changes compute and removes the governance gate.
Exam Takeaway: ML CI/CD questions are about controlled state transitions; distractors often automate one step while skipping evaluation or approval.
An ML CI/CD workflow must coordinate code, data, model artifacts, evaluation results, and deployment state. SageMaker Pipelines can express ML-native steps such as processing, training, evaluation, condition checks, model registration, and approval dependencies. CodePipeline coordinates repository commits, build stages, tests, and deployment actions. EventBridge connects state changes, schedules, or service events to automation.
Rollback strategy depends on the deployment pattern. Blue/green, canary, and linear shifts require traffic control and health checks. Automated tests can include unit tests for feature code, integration tests for pipeline components, and endpoint smoke tests after deployment.
| Object | Attribute | Value Range | Default State | Dependency | Failure State |
|---|---|---|---|---|---|
| Source repository | Trigger event | Commit, pull request merge, tag, release | No automation unless connected | CodePipeline or EventBridge rule | Pipeline not invoked |
| Build project | Buildspec phases | Install, pre_build, build, post_build | Undefined until configured | IAM role, artifact store, test commands | Failed or untested artifact |
| SageMaker pipeline | Step graph | Processing, training, evaluation, condition, register model | No ML lineage if absent | Input data, code, role | Manual non-repeatable workflow |
| Model registry gate | Approval status | Pending, approved, rejected | Pending | Evaluation metrics and reviewer action | Unapproved model deployed or approved model ignored |
| EventBridge rule | Event pattern | Schedule, state change, registry event | Disabled or absent | Correct source/detail pattern and target role | Retraining/deployment never starts |
Map the workflow state transitions: source change, data refresh, training, evaluation, registration, approval, deployment, and rollback.
Inspect CodePipeline or build state when a repository-triggered path fails.
#Official AWS CLI verification pattern.
aws codepipeline get-pipeline-state --name mla-model-deployment
aws codebuild batch-get-builds --ids example-build-id
Expected state: stages and builds reveal the first failed transition.
#Official AWS CLI verification pattern.
aws sagemaker describe-pipeline-execution --pipeline-execution-arn example-pipeline-execution-arn
Expected state: failed step, condition result, or completed registration is visible.
#Official AWS CLI verification pattern.
aws events describe-rule --name approved-model-deployment-trigger
Expected state: event pattern and target align to the approved model package state change or schedule.
A repository event or scheduled rule triggers an orchestration service. CodeBuild runs tests and packages code. SageMaker Pipelines processes data, trains a model, evaluates metrics, and registers the model if conditions pass. Approval changes model package state, which can emit an event that starts deployment. The deployment system shifts traffic and monitors health. If any state transition is missing or the event pattern is wrong, the automation chain stops even when individual services are healthy.
| Task | Precise Command or Path | Verification Standard |
|---|---|---|
| Inspect deployment pipeline | aws codepipeline get-pipeline-state --name mla-model-deployment |
Failed or current stage is visible with revision details |
| Inspect build logs | aws codebuild batch-get-builds --ids example-build-id |
Build phase status identifies test or packaging failure |
| Inspect ML pipeline execution | aws sagemaker describe-pipeline-execution --pipeline-execution-arn example-pipeline-execution-arn |
Execution status and failure reason are visible |
| Verify event trigger | aws events describe-rule --name approved-model-deployment-trigger |
Rule pattern and state match intended automation trigger |
How should a team choose between real-time, serverless, asynchronous, batch, and multi-model SageMaker deployment options?
Choose the deployment target from latency, payload size, traffic pattern, cost sensitivity, and the number of models that must be hosted.
Real-time endpoints fit low-latency steady serving. Serverless endpoints fit intermittent traffic without managing capacity. Asynchronous inference fits large payloads or longer processing times. Batch transform fits offline scoring. Multi-model endpoints can reduce hosting cost when many models share infrastructure and are not all hot at the same time.
Demand Score: 96
Exam Relevance Score: 99
What is a common reason to use containers when deploying ML workloads on SageMaker?
Containers package custom frameworks, dependencies, inference code, and runtime behavior so the workload can run consistently in SageMaker.
Some models need libraries or serving logic that are not covered by a built-in image. A container gives the team control over the runtime while still using SageMaker hosting, training, or processing infrastructure. The exam may pair this with Amazon ECR, IAM permissions, and image security requirements.
Demand Score: 86
Exam Relevance Score: 92
Why is VPC isolation important for some ML training or deployment workloads?
VPC isolation helps control network paths to private data sources, endpoints, and other internal resources used by the ML workload.
ML jobs often need access to private subnets, security groups, VPC endpoints, or restricted databases. If the networking configuration is wrong, the job might fail to read data or might violate security requirements. MLA-C01 questions often combine VPC isolation with IAM, S3 bucket policies, and KMS controls.
Demand Score: 88
Exam Relevance Score: 94
What is the role of SageMaker Pipelines in an ML workflow?
SageMaker Pipelines orchestrates repeatable ML workflow steps such as processing, training, evaluation, model registration, and conditional deployment logic.
Pipelines make ML workflows reproducible and auditable. Instead of manually running notebooks, teams can define ordered steps, pass artifacts between them, and integrate approval or evaluation gates. This is especially relevant when the exam describes CI/CD, repeatability, and controlled model promotion.
Demand Score: 93
Exam Relevance Score: 97
When is endpoint auto scaling the right solution for a deployed model?
Endpoint auto scaling is appropriate when production traffic varies and the endpoint needs to adjust capacity based on metrics such as invocation count or utilization.
Auto scaling addresses serving capacity and cost for hosted endpoints. It does not fix poor model accuracy, bad data, or missing permissions. In exam scenarios, choose auto scaling when the symptom is load-driven latency or throttling, and choose monitoring or data remediation when the symptom is drift or prediction quality.
Demand Score: 91
Exam Relevance Score: 96