Deployment and Orchestration of ML Workflows

Deployment and Orchestration of ML Workflows Detailed Explanation

Official task alignment for this domain:

Official MLA-C01 task	How this document covers it
Task 3.1: Select deployment infrastructure based on existing architecture and requirements	Real-time, serverless, asynchronous, batch, multi-model, multi-container, ECS, EKS, Lambda, edge, CPU/GPU, latency and cost tradeoffs
Task 3.2: Create and script infrastructure based on existing architecture and requirements	CloudFormation, CDK, ECR, BYOC, VPC endpoints, endpoint auto scaling, scaling metrics, maintainable infrastructure
Task 3.3: Use automated orchestration tools to set up CI/CD pipelines	SageMaker Pipelines, CodePipeline, CodeBuild, CodeDeploy, EventBridge, MWAA/Airflow, Git workflows, tests, retraining, rollback

High-frequency deployment selection memory:

Scenario clue	Strong first choice	Common distractor
Nightly or offline scoring of a large dataset	Batch transform	Always-on real-time endpoint
Low-latency synchronous request/response	Real-time endpoint	Batch transform
Large payload or long processing with delayed response	Asynchronous inference	Serverless endpoint without checking limits
Intermittent traffic and supported payload/runtime limits	Serverless inference	Provisioned endpoint for very low utilization
Many similar tenant or segment models	Multi-model endpoint	One endpoint per tiny model without cost analysis

Selecting Real-Time, Serverless, Asynchronous, Batch, and Multi-Model Deployment Targets

Exam Radar

Core Priority: Deployment questions test whether candidates match inference patterns to SageMaker endpoints, batch transform, serverless inference, asynchronous inference, multi-model endpoints, ECS, EKS, Lambda, or edge optimization.

High Frequency: Latency, payload size, traffic variability, cost, GPU/CPU need, container control, and model count drive the answer.

Confusion Alert: Real-time endpoints are not always correct. Batch workloads should not pay for always-on endpoints. Serverless is not a fit for every latency, payload, or runtime requirement.

Scenario Logic: Start with invocation pattern: synchronous low latency, bursty intermittent, large payload asynchronous, offline batch scoring, many similar models, custom orchestrator, or edge device.

Version Delta: SageMaker endpoint options and quotas change. Validate current limits for payload size, timeout, memory, and concurrency before production design.

Failure Trigger: Wrong deployment target causes timeout, unnecessary cost, cold-start latency, GPU shortage, container incompatibility, or inability to rollback.

Operational Dependency: Deployment depends on model artifact, container image, endpoint config, instance or serverless configuration, IAM role, VPC path, and monitoring.

How the Exam Asks It: The stem may mention unpredictable traffic, nightly scoring, many tenant models, large image payloads, or strict low-latency response.

How Distractors Are Designed: Wrong answers choose the most familiar endpoint type while ignoring workload timing and latency.

Why the Correct Answer Works: The correct target satisfies the request/response pattern and cost envelope.

High-Value Exam Focus: Deployment target follows invocation pattern first: real-time for synchronous low latency, batch transform for offline scoring, asynchronous for long-running or large-payload inference, serverless for intermittent supported workloads, and multi-model for many similar models.

Practice Question: A model scores 20 million historical records once each night. There is no need for online responses. Which deployment pattern is most appropriate?

A. SageMaker batch transform.
B. Always-on real-time endpoint.
C. Serverless endpoint for every record.
D. Multi-model endpoint with one model.

Correct Answer: A

Explanation: A fits offline batch scoring. B pays for always-on hosting without online need. C is designed for synchronous invocation patterns, not large scheduled batch scoring. D solves hosting many models, not nightly scoring.

Exam Takeaway: Pick deployment from invocation pattern first; endpoint distractors are common when the scenario is actually batch.

Atomic Deconstruction - Operational Level

Deployment maps a trained artifact to an invocation contract. Real-time endpoints serve synchronous low-latency requests. Serverless endpoints reduce idle cost for intermittent traffic but introduce limits and cold-start considerations. Asynchronous inference decouples request submission from response retrieval for larger payloads or longer processing. Batch transform scores offline datasets. Multi-model endpoints host many models behind one endpoint when models share a serving container pattern.

Choosing ECS, EKS, or Lambda can be valid when the scenario requires custom application orchestration, container control, or event-driven glue code, but those choices add operational ownership. SageMaker endpoints provide ML-specific hosting, variant routing, and integration with SageMaker monitoring.

Component Specifications

Object	Attribute	Value Range	Default State	Dependency	Failure State
Endpoint type	Invocation mode	Real-time, serverless, asynchronous, multi-model	No hosting until endpoint config exists	Latency, payload, traffic pattern	Timeout or excessive cost
Batch transform job	Input data path	S3 prefix or manifest	Not scheduled	Model artifact and transform resources	Offline scoring fails
Endpoint variant	Traffic weight	0-100 percent per variant	Single variant if unmanaged	Endpoint config and model objects	Wrong model serves traffic
Container image	Serving runtime	SageMaker image or custom ECR image	Undefined	Inference handler and dependencies	Model cannot load or invoke
Compute selection	Instance/serverless config	CPU, GPU, memory, concurrency	Unset until config	Model size, latency, cost	Capacity or cold-start issue

Step-by-Step Execution Path

Classify invocation timing and latency. This determines the deployment family before infrastructure details.
Verify the model artifact and container image selected for hosting.

#Official AWS CLI verification pattern.  
aws sagemaker describe-model --model-name example-model

Expected state: model data URL, image, role, and environment match the deployment plan.

Inspect endpoint configuration or transform job configuration.

#Official AWS CLI verification pattern.  
aws sagemaker describe-endpoint-config --endpoint-config-name example-endpoint-config

Expected state: production variants, instance/serverless options, and model references align to traffic needs.

For batch scoring, verify transform job state instead of endpoint state.

#Official AWS CLI verification pattern.  
aws sagemaker describe-transform-job --transform-job-name nightly-scoring-job

Expected state: job completed and output S3 path contains predictions.

Validate invocation behavior with a small request or sample batch before shifting production workload.

Technical Chain

SageMaker creates a model object that binds artifact, container image, execution role, and optional VPC settings. Endpoint config maps the model to compute and variant routing, while batch transform maps the model to an offline input/output job. The invocation path then either waits synchronously, stores asynchronous results, or writes batch outputs. If the deployment family mismatches the traffic pattern, the system fails through latency, timeout, cost, or capacity pressure.

Operational Skills Matrix

Task	Precise Command or Path	Verification Standard
Inspect model object	`aws sagemaker describe-model --model-name example-model`	Artifact, image, and role match intended deployment
Inspect endpoint config	`aws sagemaker describe-endpoint-config --endpoint-config-name example-endpoint-config`	Variant and compute settings fit latency and traffic pattern
Validate endpoint status	`aws sagemaker describe-endpoint --endpoint-name example-endpoint`	Endpoint status is InService before production traffic
Validate batch scoring	`aws sagemaker describe-transform-job --transform-job-name nightly-scoring-job`	Job completed and output S3 location is populated

Infrastructure as Code, Containers, VPC Isolation, and Endpoint Auto Scaling

Exam Radar

Core Priority: MLA-C01 tests whether candidates can provision maintainable ML infrastructure with CloudFormation, AWS CDK, containers, ECR, SageMaker endpoints, VPC configuration, and auto scaling.

High Frequency: Questions involve on-demand versus provisioned resources, endpoint auto scaling metrics, BYOC containers, VPC subnets/security groups, and stack communication.

Confusion Alert: Distractors may manually create resources when the scenario asks for repeatable environments. Another trap is scaling on a generic metric when invocations per instance or latency is the actual endpoint pressure signal.

Scenario Logic: Determine whether the requirement is repeatability, isolation, scaling, container dependency control, or cost optimization. Then choose IaC, VPC settings, ECR/BYOC, and scaling policies accordingly.

Version Delta: Auto scaling metric names and service quotas can change. Confirm current SageMaker endpoint scaling documentation before production use.

Failure Trigger: Deployment fails when the endpoint cannot pull the image, subnets lack required network path, security groups block dependencies, scaling policy targets the wrong metric, or IaC stacks drift.

Operational Dependency: ML infrastructure depends on model artifact, ECR image, execution role, VPC route, endpoint configuration, scalable target, scaling policy, and stack outputs.

How the Exam Asks It: The stem may mention repeated dev/test/prod environments, private model hosting, custom inference dependencies, or traffic spikes.

How Distractors Are Designed: Wrong choices skip IaC, use public endpoints despite isolation requirements, or scale training jobs instead of endpoint variants.

Why the Correct Answer Works: The correct answer provisions the controllable infrastructure object and verifies the dependency that owns the behavior.

High-Value Exam Focus: IaC questions reward repeatability, VPC questions reward network-path dependency checks, BYOC questions reward ECR image identity, and auto scaling questions reward endpoint-variant metrics such as invocations per instance or latency-related signals.

Practice Question: A SageMaker endpoint has rising latency during traffic spikes. The team wants automatic capacity changes based on endpoint load. Which metric is most directly aligned?

A. InvocationsPerInstance for the endpoint variant.
B. Number of objects in the training S3 bucket.
C. Total model packages in the registry.
D. Number of CodeCommit branches.

Correct Answer: A

Explanation: A measures per-instance endpoint load and is commonly aligned with SageMaker endpoint scaling. B, C, and D do not observe serving pressure.

Exam Takeaway: Auto scaling decisions must inspect the serving resource; unrelated storage, registry, or repository counts are distractors.

Atomic Deconstruction - Operational Level

Infrastructure as Code records the desired state of ML resources so environments can be recreated and reviewed. CloudFormation and CDK define roles, buckets, VPC resources, endpoint configs, scaling targets, and pipelines. Containers package inference code and dependencies. ECR stores the image SageMaker or container services pull at runtime.

VPC isolation changes the network dependency. A private endpoint configuration must have subnets, security groups, route tables, and endpoints or NAT paths that let the service reach required resources such as S3, ECR, CloudWatch, and KMS. Auto scaling attaches to the endpoint variant as a scalable target and changes capacity based on selected metrics.

Component Specifications

Object	Attribute	Value Range	Default State	Dependency	Failure State
IaC stack	Resource definition	CloudFormation template or CDK app	Manual drift if unmanaged	Parameter values and stack permissions	Environment mismatch
ECR image	Image digest	SHA256 digest or tag reference	Tag can move unless pinned	Build pipeline and repository policy	Wrong runtime dependencies
VPC endpoint config	Subnets and security groups	Private subnets, controlled ingress/egress	Public path if VPC not configured	Routes to S3/ECR/KMS/CloudWatch	Image pull or data access failure
Scalable target	Min/max capacity	Endpoint variant capacity bounds	Fixed capacity	Application Auto Scaling registration	No automatic response to traffic
Scaling policy	Metric target	InvocationsPerInstance, latency-related custom metric, scheduled policy	None	CloudWatch metrics and target tracking	Over/under scaling

Step-by-Step Execution Path

Identify whether the scenario requires repeatability, isolation, custom dependencies, or automatic capacity. Each requirement maps to a different control object.
Inspect IaC stack status before debugging individual resources.

#Official AWS CLI verification pattern.  
aws cloudformation describe-stacks --stack-name mla-prod-endpoint-stack

Expected state: stack is complete and outputs provide model, endpoint, subnet, or role identifiers.

Verify container image identity.

#Official AWS CLI verification pattern.  
aws ecr describe-images --repository-name mla-inference --image-ids imageTag=prod

Expected state: image digest matches the approved build.

Inspect endpoint VPC and scaling configuration.

#Official AWS CLI verification pattern.  
aws sagemaker describe-model --model-name example-model  
aws application-autoscaling describe-scalable-targets --service-namespace sagemaker

Expected state: VPC config and scalable target exist for the endpoint variant when required.

Compare CloudWatch endpoint metrics against the scaling policy target to confirm the policy can react to actual pressure.

Technical Chain

The IaC stack creates roles, networking, repository references, model objects, endpoint configs, and scaling resources. SageMaker pulls the inference image from ECR, loads the model artifact, and attaches the endpoint to selected networking. Application Auto Scaling reads CloudWatch metrics for the endpoint variant and changes capacity within min/max bounds. If the image digest is wrong, the runtime fails. If VPC dependencies are blocked, the endpoint cannot reach data or logs. If the metric is misaligned, scaling reacts too late or not at all.

Operational Skills Matrix

Task	Precise Command or Path	Verification Standard
Inspect IaC stack	`aws cloudformation describe-stacks --stack-name mla-prod-endpoint-stack`	Stack status is complete and outputs match deployed resources
Verify image digest	`aws ecr describe-images --repository-name mla-inference --image-ids imageTag=prod`	Digest matches approved build artifact
Inspect model VPC config	`aws sagemaker describe-model --model-name example-model`	VPC configuration exists when private hosting is required
Inspect scalable target	`aws application-autoscaling describe-scalable-targets --service-namespace sagemaker`	Endpoint variant is registered with correct min/max capacity

CI/CD and Workflow Orchestration with SageMaker Pipelines, EventBridge, CodePipeline, and Airflow

Exam Radar

Core Priority: ML workflow orchestration combines data processing, training, evaluation, registration, approval, deployment, testing, and retraining triggers.

High Frequency: SageMaker Pipelines, CodePipeline, CodeBuild, CodeDeploy, EventBridge, Step Functions, Amazon MWAA, Git workflows, automated tests, and rollback strategies appear in deployment domain questions.

Confusion Alert: A common trap is using CI/CD tools only for application code while leaving data, model, and evaluation gates manual. Another is retraining without a trigger or approval condition.

Scenario Logic: Determine whether the requirement is ML pipeline reproducibility, source-triggered build, scheduled retraining, event-driven response, or deployment rollback.

Version Delta: Pipeline integrations and service quotas change. Verify current service support for the target region and account.

Failure Trigger: Workflow failures happen from missing source artifacts, broken IAM roles, failed tests, unapproved model packages, incorrect EventBridge rule patterns, or rollback policies that do not target the serving endpoint.

Operational Dependency: CI/CD depends on repository events, buildspec, pipeline stages, IAM role permissions, artifact stores, model registry state, and deployment target health.

How the Exam Asks It: Stems may include a commit that should trigger retraining, a scheduled data refresh, a model approval gate, or a failed canary deployment needing rollback.

How Distractors Are Designed: Wrong options use one service to solve every step, skip automated tests, or deploy directly from a notebook.

Why the Correct Answer Works: The correct answer wires the event, build, ML pipeline, approval, and deployment controls in the right sequence.

High-Value Exam Focus: CI/CD answers should preserve state transitions: source change or schedule, build/test, processing/training, evaluation condition, registry approval, deployment, health check, and rollback. Direct notebook deployment is usually a governance distractor.

Practice Question: A team wants every approved model package to trigger a deployment pipeline, but only after evaluation metrics pass and manual approval is complete. Which design best fits?

A. Use SageMaker Pipelines to evaluate/register the model and EventBridge/CodePipeline to trigger deployment on approved model package state.
B. Deploy directly from a notebook after training completes.
C. Use S3 Transfer Acceleration to approve models faster.
D. Increase the training instance size and skip registry approval.

Correct Answer: A

Explanation: A connects evaluation, registry approval, event trigger, and deployment automation. B is manual and unaudited. C affects S3 transfer, not approval workflow. D changes compute and removes the governance gate.

Exam Takeaway: ML CI/CD questions are about controlled state transitions; distractors often automate one step while skipping evaluation or approval.

Atomic Deconstruction - Operational Level

An ML CI/CD workflow must coordinate code, data, model artifacts, evaluation results, and deployment state. SageMaker Pipelines can express ML-native steps such as processing, training, evaluation, condition checks, model registration, and approval dependencies. CodePipeline coordinates repository commits, build stages, tests, and deployment actions. EventBridge connects state changes, schedules, or service events to automation.

Rollback strategy depends on the deployment pattern. Blue/green, canary, and linear shifts require traffic control and health checks. Automated tests can include unit tests for feature code, integration tests for pipeline components, and endpoint smoke tests after deployment.

Component Specifications

Object	Attribute	Value Range	Default State	Dependency	Failure State
Source repository	Trigger event	Commit, pull request merge, tag, release	No automation unless connected	CodePipeline or EventBridge rule	Pipeline not invoked
Build project	Buildspec phases	Install, pre_build, build, post_build	Undefined until configured	IAM role, artifact store, test commands	Failed or untested artifact
SageMaker pipeline	Step graph	Processing, training, evaluation, condition, register model	No ML lineage if absent	Input data, code, role	Manual non-repeatable workflow
Model registry gate	Approval status	Pending, approved, rejected	Pending	Evaluation metrics and reviewer action	Unapproved model deployed or approved model ignored
EventBridge rule	Event pattern	Schedule, state change, registry event	Disabled or absent	Correct source/detail pattern and target role	Retraining/deployment never starts

Step-by-Step Execution Path

Map the workflow state transitions: source change, data refresh, training, evaluation, registration, approval, deployment, and rollback.
Inspect CodePipeline or build state when a repository-triggered path fails.

#Official AWS CLI verification pattern.  
aws codepipeline get-pipeline-state --name mla-model-deployment  
aws codebuild batch-get-builds --ids example-build-id

Expected state: stages and builds reveal the first failed transition.

Inspect SageMaker pipeline execution for ML-native failures.

#Official AWS CLI verification pattern.  
aws sagemaker describe-pipeline-execution --pipeline-execution-arn example-pipeline-execution-arn

Expected state: failed step, condition result, or completed registration is visible.

Verify event trigger rules.

#Official AWS CLI verification pattern.  
aws events describe-rule --name approved-model-deployment-trigger

Expected state: event pattern and target align to the approved model package state change or schedule.

Confirm deployment health and rollback target after the pipeline runs.

Technical Chain

A repository event or scheduled rule triggers an orchestration service. CodeBuild runs tests and packages code. SageMaker Pipelines processes data, trains a model, evaluates metrics, and registers the model if conditions pass. Approval changes model package state, which can emit an event that starts deployment. The deployment system shifts traffic and monitors health. If any state transition is missing or the event pattern is wrong, the automation chain stops even when individual services are healthy.

Operational Skills Matrix

Task	Precise Command or Path	Verification Standard
Inspect deployment pipeline	`aws codepipeline get-pipeline-state --name mla-model-deployment`	Failed or current stage is visible with revision details
Inspect build logs	`aws codebuild batch-get-builds --ids example-build-id`	Build phase status identifies test or packaging failure
Inspect ML pipeline execution	`aws sagemaker describe-pipeline-execution --pipeline-execution-arn example-pipeline-execution-arn`	Execution status and failure reason are visible
Verify event trigger	`aws events describe-rule --name approved-model-deployment-trigger`	Rule pattern and state match intended automation trigger

Shopping cart

Subtotal:

MLA-C01 Deployment and Orchestration of ML Workflows

Detailed list of MLA-C01 knowledge points