Deploy and maintain data pipelines and workloads

Deploy and maintain data pipelines and workloads Detailed Explanation

Fast review map for this domain:

Exam signal	First object to inspect	Correct-answer pattern
Pipeline has multiple dependent tasks	Lakeflow Jobs task graph or declarative pipeline	Model order, retries, and dependencies before tuning individual notebooks
Workload must run on schedule or trigger	Job trigger, schedule, alert, restart policy	Use job settings, not notebook code, for orchestration behavior
Changes must move through SDLC	Git, branches, pull requests, tests, bundles, CLI, REST	Separate source review, packaging, deployment, and environment variables
Production workload is slow or failing	Runs, Spark UI, DAG, query profile, Delta optimization, Azure Monitor	Use evidence-first troubleshooting before changing code

flowchart LR  
    N1[Source control] --> N2  
    N2[Bundle or pipeline definition] --> N3  
    N3[Lakeflow Job] --> N4  
    N4[Run evidence] --> N5  
    N5[Monitoring and optimization]

Design and implement Lakeflow pipelines and job task logic

Exam Radar

Core Priority: Workload deployment questions test the object that owns orchestration, source review, repeatable deployment, runtime recovery, or production evidence.
High Frequency: Expect Lakeflow Jobs, task dependencies, declarative pipelines, schedules, triggers, alerts, automatic restarts, Git, pull requests, tests, Asset Bundles, CLI, REST, Spark UI, query profile, OPTIMIZE, VACUUM, Log Analytics, and Azure Monitor.
Confusion Alert: Do not rewrite pipeline code before checking the task graph, run history, bundle validation, Spark UI, or monitoring route.
Scenario Logic: Find whether the stem asks for orchestration design, deployment lifecycle, recovery operation, performance diagnosis, or alerting.
Version Delta: This topic remains in the Microsoft DP-750 skills measured from March 11, 2026 under Deploy and maintain data pipelines and workloads; answer choices should use current Azure Databricks, Unity Catalog, Lakeflow, Azure Monitor, and Microsoft Entra terminology.
Failure Trigger: The failure appears as downstream tasks running too early, missed schedules, invisible failures, unsafe reruns, unresolved conflicts, skew, spill, shuffle, or missing logs.
Operational Dependency: Task graph, parameters, idempotence, bundle target variables, compute state, Delta layout, and diagnostic settings must be validated.
How the Exam Asks It: The exam asks for the first operational action that produces evidence, not the most familiar Databricks feature.
How Distractors Are Designed: Wrong answers tune data, grant broad permissions, optimize tables, or change code when the run/deploy/monitor object is still unverified.
Why the Correct Answer Works: The correct answer selects the owner of the run or deployment behavior and confirms it with run, bundle, profile, Delta, or Azure Monitor evidence.

Practice Question: A pipeline must load raw files, validate them, transform curated tables, and refresh a reporting aggregate. The aggregate must not run if validation fails. What should be configured?
A. A Lakeflow Job or pipeline task graph with explicit dependencies and failure behavior.
B. A larger SQL warehouse for the aggregate only.
C. A notebook comment that lists the desired order.
D. A Delta VACUUM command before validation.
Correct Answer: A.
Explanation: A is correct because ordering and failure behavior are orchestration concerns. B changes compute for one query. C is documentation, not execution. D removes old files and is unrelated to task order. Exam Takeaway: Select the object that owns the dependency; the distractor pattern is an adjacent Databricks feature that is technically real but does not satisfy the scenario's first blocking condition.

Atomic Deconstruction - Operational Level

Workload deployment topics begin with run ownership. A Lakeflow Job owns task order, parameters, schedules, triggers, alerts, retries, and repair behavior; source control and bundles own reviewed, repeatable deployment; Spark UI and Azure Monitor own evidence after the run starts.

The exam often places the failure late in the pipeline, but the fix may be early in the graph. If validation fails, the aggregate should not run. If a source API times out, retries must be safe and idempotent. If a stage spills or skews, the Spark UI and query profile should be inspected before code is rewritten.

Operationally, every pipeline answer should leave a trace: a task graph, a run attempt, a repair action, a bundle validation result, a query profile, a Delta history row, or an Azure Monitor alert. Answers without evidence are usually weaker in DP-750 troubleshooting scenarios.

Component Specifications

Object	Attribute	Value Range	Default State	Dependency	Failure State
Pipeline task graph	Execution dependency	Sequential, parallel, conditional, or failed-dependency behavior	No dependency unless configured	Job task definitions and upstream outputs	Task runs before required data exists
Notebook task	Programmable workload unit	Notebook path, parameters, cluster/job compute	Not scheduled alone	Workspace object and compute access	Manual execution differs from job execution
Lakeflow Spark Declarative Pipeline	Declarative data pipeline	Tables, expectations, flow definitions	Not deployed until configured	Source access and target schema	Pipeline loses quality or dependency semantics
Error handling rule	Failure response	Retry, repair, stop, alert, or compensation	Default job behavior	Task criticality and idempotence	Partial load creates inconsistent target state
Precedence constraint	Ordering control	Depends-on relationships	No order across independent tasks	Upstream completion state	Downstream task reads stale or missing data

Step-by-Step Execution Path

Map the data dependency graph before choosing the implementation tool: ingestion, validation, transformation, publication, and notification.
Choose notebooks when custom procedural logic dominates; choose Lakeflow Spark Declarative Pipelines when declarative tables, flows, and expectations own the behavior.
Create explicit task dependencies so downstream steps wait for the correct upstream completion state.
Design error handling based on idempotence: retry transient reads, fail fast for invalid data, and repair only failed tasks when state allows it.
Pass parameters through job task configuration rather than hard-coding environment paths inside notebooks.
Validate the run graph and failed-path behavior using a controlled failure scenario.

Exam implementation pattern:

When to use: the scenario asks for operation order, notebook versus Lakeflow Spark Declarative Pipelines, task logic, precedence constraints, or error handling.
Minimal syntax: model the task graph first, then define upstream/downstream dependencies and failure behavior in Lakeflow Jobs or declarative pipeline configuration.
What to verify: task graph, parameters, upstream completion state, failed-path behavior, and target table update order.
Common wrong answer: relying on notebook comments or manual run order for a production dependency chain.

Command confidence note: Commands shown in this section are verification-oriented examples. Validate exact Databricks CLI syntax against the active CLI and workspace version before using it as an authoritative production procedure.

Technical Chain

The chain starts when source control, an Asset Bundle, a schedule, a trigger, or a manual operator initiates workload execution. Azure Databricks resolves the job definition, task graph, parameters, compute, and permissions before each task runs.

During execution, each task emits run state, Spark stages, query profiles, pipeline event logs, Delta history, and diagnostic events. Repair, restart, retry, or stop actions are safe only when the checkpoint and idempotence model support them.

Optimization decisions should follow evidence: Spark UI for skew and shuffle, query profile for scan and join behavior, Delta history for OPTIMIZE/VACUUM, and Azure Monitor for centralized alerting. This chain prevents blind code rewrites.

Exam Trap Summary: Do not rely on notebook execution order when task dependencies, failure behavior, and pipeline state should be explicit.

Operational Skills Matrix

Task	Precise Command or Path	Verification Standard
Validate task graph	Azure Databricks Workflows/Lakeflow Jobs > target job > Tasks	Dependencies match required pipeline order
Validate failed-path behavior	Job run details after controlled failure	Downstream dependent tasks are skipped, retried, or stopped as designed
Validate parameters	Job task configuration > Parameters	Environment, source path, and target schema values are externalized
Validate pipeline run state	Pipeline details > Latest update or job run page	Tables update in dependency order and quality rules execute

Implement Lakeflow Jobs schedules, triggers, alerts, and automatic restarts

Exam Radar

Core Priority: Workload deployment questions test the object that owns orchestration, source review, repeatable deployment, runtime recovery, or production evidence.
High Frequency: Expect Lakeflow Jobs, task dependencies, declarative pipelines, schedules, triggers, alerts, automatic restarts, Git, pull requests, tests, Asset Bundles, CLI, REST, Spark UI, query profile, OPTIMIZE, VACUUM, Log Analytics, and Azure Monitor.
Confusion Alert: Do not rewrite pipeline code before checking the task graph, run history, bundle validation, Spark UI, or monitoring route.
Scenario Logic: Find whether the stem asks for orchestration design, deployment lifecycle, recovery operation, performance diagnosis, or alerting.
Version Delta: This topic remains in the Microsoft DP-750 skills measured from March 11, 2026 under Deploy and maintain data pipelines and workloads; answer choices should use current Azure Databricks, Unity Catalog, Lakeflow, Azure Monitor, and Microsoft Entra terminology.
Failure Trigger: The failure appears as downstream tasks running too early, missed schedules, invisible failures, unsafe reruns, unresolved conflicts, skew, spill, shuffle, or missing logs.
Operational Dependency: Task graph, parameters, idempotence, bundle target variables, compute state, Delta layout, and diagnostic settings must be validated.
How the Exam Asks It: The exam asks for the first operational action that produces evidence, not the most familiar Databricks feature.
How Distractors Are Designed: Wrong answers tune data, grant broad permissions, optimize tables, or change code when the run/deploy/monitor object is still unverified.
Why the Correct Answer Works: The correct answer selects the owner of the run or deployment behavior and confirms it with run, bundle, profile, Delta, or Azure Monitor evidence.

Practice Question: A production job occasionally fails because a source API times out. The task is idempotent and should retry automatically while notifying the on-call channel if retries fail. What settings matter most?
A. Task retry or automatic restart settings plus job alert notifications.
B. Column masks on the target table.
C. A different table partition scheme before every run.
D. A larger number of catalogs.
Correct Answer: A.
Explanation: A is correct because the scenario describes transient runtime recovery and visibility. B is security. C is physical design and not per-failure recovery. D is namespace organization. Exam Takeaway: Select the object that owns the dependency; the distractor pattern is an adjacent Databricks feature that is technically real but does not satisfy the scenario's first blocking condition.

Atomic Deconstruction - Operational Level

Component Specifications

Object	Attribute	Value Range	Default State	Dependency	Failure State
Lakeflow Job	Workflow definition	One or more tasks with compute and parameters	Absent until created	Workspace permissions and task assets	No repeatable production run exists
Trigger	Start condition	Scheduled, file-arrival, manual, or supported event pattern	Manual by default in many workflows	Workspace feature and source signal	Job runs late or not at all
Schedule	Time-based cadence	Cron or UI-supported schedule	Disabled until configured	Timezone and business SLA	Pipeline misses freshness objective
Alert/notification	Operational signal	Failure, duration, success, or skipped event	No recipient unless configured	Email/webhook integration and job state	Failures remain invisible
Automatic restart	Recovery policy	Task or pipeline retry/restart settings	Default retry behavior	Idempotent task design	Transient failures require manual intervention

Step-by-Step Execution Path

Inspect the job task and failure type to confirm the workload is safe to retry.
Configure trigger or schedule according to freshness requirement and timezone, not according to developer convenience.
Set retries, automatic restarts, or repair behavior only where the task is idempotent or has a safe checkpoint.
Configure alerts for final failure, long duration, or skipped runs so operational owners receive actionable signals.
Use run history to compare start time, duration, retry count, and terminal state.
Stop or repair runs from the job UI only after identifying whether state is partial, checkpointed, or safe to resume.

Exam implementation pattern:

When to use: the stem names trigger, schedule, notification, alert, retry, restart, or freshness SLA.
Minimal syntax: configure schedule/trigger, retries or restart policy, and notifications in the Lakeflow Job settings.
What to verify: timezone, cadence, trigger condition, alert recipient, retry count, and recent run attempts.
Common wrong answer: putting orchestration behavior inside notebook code instead of job configuration.

Technical Chain

Exam Trap Summary: Do not hide scheduling, retry, restart, or notification behavior inside notebook code when Lakeflow Job settings own it.

Operational Skills Matrix

Task	Precise Command or Path	Verification Standard
Validate schedule	Lakeflow Jobs > target job > Schedule & Triggers	Cadence and timezone match the SLA
Validate alert routing	Lakeflow Jobs > target job > Notifications	Failure or duration alert recipients are configured
Validate retry behavior	Job task settings and recent run attempts	Retry count and terminal state match recovery policy
Validate repair option	Job run details > Repair run availability	Only failed tasks are selected when repair is appropriate

Troubleshoot and repair Lakeflow Jobs with repair, restart, stop, and run functions

Exam Radar

Core Priority: Repair, restart, stop, and run-now decisions protect production state after a Lakeflow Job fails.
High Frequency: Expect scenarios with failed middle tasks, successful upstream ingestion, skipped downstream tasks, retries, and safe rerun constraints.
Confusion Alert: Do not restart the whole job when a repairable failed task can rerun safely, and do not repair append-only side effects blindly.
Scenario Logic: Inspect failed run details, task idempotence, checkpoint state, and downstream dependency before choosing repair or restart.
Version Delta: This topic remains in the Microsoft DP-750 skills measured from March 11, 2026 under Deploy and maintain data pipelines and workloads; answer choices should use current Azure Databricks, Unity Catalog, Lakeflow, Azure Monitor, and Microsoft Entra terminology.
Failure Trigger: The failure becomes worse when operators rerun the wrong scope and duplicate records, skip setup tasks, or lose the original error evidence.
Operational Dependency: Run history, task graph, parameters, checkpoint behavior, and target-table validation decide which recovery action is safe.
How the Exam Asks It: The exam asks which operation to use after a failed run, not only how to create or schedule the job.
How Distractors Are Designed: Wrong answers optimize tables, grant job admin broadly, or delete objects before reading failed-run evidence.
Why the Correct Answer Works: The correct answer matches the recovery scope to the failed task and proves the final run state plus target data state.

Practice Question: A Lakeflow Job fails in the transformation task after ingestion succeeded. The transformation task writes idempotently with a merge key, and downstream aggregate tasks did not run. What recovery action best fits?
A. Delete the target catalog and rerun the entire workspace.
B. Repair the failed run from the failed transformation task after confirming upstream outputs and idempotence.
C. Grant every engineer CAN MANAGE on all jobs.
D. Optimize the target table before checking the failed run details.
Correct Answer: B.
Explanation: B is correct because repair can rerun the failed portion when upstream state is valid and the task is safe to repeat. A is destructive and unrelated. C overprivileges. D may help performance later but does not recover the failed task. Exam Takeaway: Select the object that owns the dependency; the distractor pattern is an adjacent Databricks feature that is technically real but does not satisfy the scenario's first blocking condition.

Atomic Deconstruction - Operational Level

Component Specifications

Object	Attribute	Value Range	Default State	Dependency	Failure State
Job run	Execution instance	Queued, running, failed, canceled, succeeded, or skipped	Created when triggered or manually started	Job definition, task graph, and compute availability	Operator repairs the wrong run or loses failure context
Repair run	Failed-task recovery	Rerun selected failed or downstream tasks where supported	Unavailable until a run fails in a repairable state	Task idempotence and preserved upstream outputs	Duplicate side effects or stale upstream data if repaired blindly
Restart action	Whole workload recovery	Restart job or pipeline from configured start behavior	Manual unless automatic restart configured	Checkpoint and safe reprocessing design	Pipeline reprocesses source data or skips required setup
Stop/cancel action	Interrupt behavior	Graceful or forced stop depending on workload state	No stop unless operator acts	Partial writes, streaming checkpoint, and transactional guarantees	Target tables are left in partial or ambiguous state
Run now	Manual execution	Immediate run with configured or supplied parameters	No execution until invoked	Parameter values and permission to run job	Manual run uses wrong environment or target schema

Step-by-Step Execution Path

Open the failed run details before changing code or rerunning anything; capture the failed task, upstream status, parameters, compute, and error message.
Decide whether the failed task is idempotent. MERGE with a stable key, checkpointed streaming, or replace-table semantics may be repairable; append-only side effects may require cleanup first.
Use repair when upstream successful tasks should be preserved and only failed/downstream tasks need rerun.
Use restart or run-now when the full graph must reinitialize because parameters, source state, or setup tasks changed.
Use stop or cancel when continuing the run would create more bad output, then inspect partial writes and checkpoint state before recovery.
Record final run evidence: terminal state, rerun task list, duration, retry/repair history, and downstream table validation.

Exam implementation pattern:

When to use: a failed Lakeflow Job has valid upstream outputs and a failed or downstream task can safely rerun.
Minimal syntax: use the job run page repair action for failed/downstream tasks; use restart or run-now only when the whole graph must reinitialize.
What to verify: failed task, upstream state, idempotence, selected repair scope, terminal run state, and target-table checks.
Common wrong answer: rerunning the whole job or deleting targets before inspecting failed-run evidence.

Technical Chain

Exam Trap Summary: Do not rerun the entire job blindly; repair only when upstream state is valid and the failed task is idempotent.

Operational Skills Matrix

Task	Precise Command or Path	Verification Standard
Validate failed task	Lakeflow Jobs > target job > Runs > failed run > Task details	The failed task, error message, parameters, and upstream status are visible
Validate repair scope	Failed run > Repair run dialog or supported run action	Only failed and required downstream tasks are selected
Validate safe restart	Job run history and checkpoint/table state review	Restart decision is backed by idempotence or cleanup evidence
Validate final recovery	Latest run details plus target table row-count or quality check	Run succeeds and downstream outputs match expected state

Implement Databricks development lifecycle with Git, tests, Asset Bundles, CLI, and REST APIs

Exam Radar

Core Priority: Workload deployment questions test the object that owns orchestration, source review, repeatable deployment, runtime recovery, or production evidence.
High Frequency: Expect Lakeflow Jobs, task dependencies, declarative pipelines, schedules, triggers, alerts, automatic restarts, Git, pull requests, tests, Asset Bundles, CLI, REST, Spark UI, query profile, OPTIMIZE, VACUUM, Log Analytics, and Azure Monitor.
Confusion Alert: Do not rewrite pipeline code before checking the task graph, run history, bundle validation, Spark UI, or monitoring route.
Scenario Logic: Find whether the stem asks for orchestration design, deployment lifecycle, recovery operation, performance diagnosis, or alerting.
Version Delta: This topic remains in the Microsoft DP-750 skills measured from March 11, 2026 under Deploy and maintain data pipelines and workloads; answer choices should use current Azure Databricks, Unity Catalog, Lakeflow, Azure Monitor, and Microsoft Entra terminology.
Failure Trigger: The failure appears as downstream tasks running too early, missed schedules, invisible failures, unsafe reruns, unresolved conflicts, skew, spill, shuffle, or missing logs.
Operational Dependency: Task graph, parameters, idempotence, bundle target variables, compute state, Delta layout, and diagnostic settings must be validated.
How the Exam Asks It: The exam asks for the first operational action that produces evidence, not the most familiar Databricks feature.
How Distractors Are Designed: Wrong answers tune data, grant broad permissions, optimize tables, or change code when the run/deploy/monitor object is still unverified.
Why the Correct Answer Works: The correct answer selects the owner of the run or deployment behavior and confirms it with run, bundle, profile, Delta, or Azure Monitor evidence.

Practice Question: A team needs repeatable deployment of notebooks, jobs, variables, and permissions across dev and prod workspaces. What should they package and deploy?
A. A Databricks Asset Bundle with target-specific variables and resources.
B. Manual notebook exports emailed to workspace admins.
C. A one-time SQL MERGE statement.
D. A table comment on each target table.
Correct Answer: A.
Explanation: A is correct because Asset Bundles package resources and environment-specific deployment metadata. B is not repeatable. C is a data operation. D improves discovery but not deployment lifecycle. Exam Takeaway: Select the object that owns the dependency; the distractor pattern is an adjacent Databricks feature that is technically real but does not satisfy the scenario's first blocking condition.

Atomic Deconstruction - Operational Level

Component Specifications

Object	Attribute	Value Range	Default State	Dependency	Failure State
Git folder/repo	Source-control binding	Branch, commit, pull request	Uncommitted workspace code	Provider integration and user permission	Production changes are not reviewable
Pull request	Review gate	Changed files, approvals, comments, conflict state	Not created until branch pushed	Branching model and reviewers	Broken changes merge without review
Test suite	Quality gate	Unit, integration, end-to-end, UAT	Absent unless implemented	Test data, fixtures, and environment access	Bundle deploys code that fails at runtime
Databricks Asset Bundle	Deployment package	Resources, variables, targets, permissions	No bundle until configured	CLI version and workspace authentication	Environment-specific settings are hard-coded
REST API deployment	Programmatic deployment call	Supported workspace REST endpoint payload	No action until invoked	Token, host, API version, payload validation	Automation fails with auth or schema errors

Step-by-Step Execution Path

Use Git branches and pull requests to control source review before deployment packaging.
Run unit tests for transformation functions, integration tests for service boundaries, end-to-end tests for pipeline flow, and UAT for business acceptance.
Define Databricks Asset Bundle resources for jobs, pipelines, notebooks, variables, permissions, and target environments.
Validate bundle configuration locally or in CI before deploying to a workspace.
Deploy with the Databricks CLI when the bundle path is supported; use REST APIs for automation scenarios where direct API management is required.
After deployment, inspect workspace job, pipeline, permission, and variable state rather than assuming a successful command proves runtime readiness.

Exam implementation pattern:

When to use: notebooks, jobs, pipelines, variables, and permissions must move through repeatable dev/prod deployment.
Minimal syntax: define databricks.yml with bundle name, targets, variables, and resources; validate with databricks bundle validate.
What to verify: target variables resolve, CI tests pass, deployed resources exist, and permissions match the target environment.
Common wrong answer: exporting notebooks manually or treating a SQL data operation as deployment lifecycle management.

Technical Chain

Exam Trap Summary: Do not manually export notebooks when repeatable dev/prod deployment, target variables, and resource permissions must be reviewed.

Operational Skills Matrix

Task	Precise Command or Path	Verification Standard
Validate branch review	Git provider pull request page	PR has reviewed changes and no unresolved conflicts
Validate test coverage gate	CI run summary or local test output	Unit, integration, or end-to-end tests relevant to changed resources pass
Validate bundle configuration	Databricks CLI active-version validation: `databricks bundle validate`	Bundle resolves resources and target variables without errors
Validate deployed resources	Databricks workspace > Jobs/Pipelines/Workspace files after deployment	Expected resources, schedules, and permissions exist in target environment

Monitor, troubleshoot, and optimize Azure Databricks workloads

Exam Radar

Core Priority: Workload deployment questions test the object that owns orchestration, source review, repeatable deployment, runtime recovery, or production evidence.
High Frequency: Expect Lakeflow Jobs, task dependencies, declarative pipelines, schedules, triggers, alerts, automatic restarts, Git, pull requests, tests, Asset Bundles, CLI, REST, Spark UI, query profile, OPTIMIZE, VACUUM, Log Analytics, and Azure Monitor.
Confusion Alert: Do not rewrite pipeline code before checking the task graph, run history, bundle validation, Spark UI, or monitoring route.
Scenario Logic: Find whether the stem asks for orchestration design, deployment lifecycle, recovery operation, performance diagnosis, or alerting.
Version Delta: This topic remains in the Microsoft DP-750 skills measured from March 11, 2026 under Deploy and maintain data pipelines and workloads; answer choices should use current Azure Databricks, Unity Catalog, Lakeflow, Azure Monitor, and Microsoft Entra terminology.
Failure Trigger: The failure appears as downstream tasks running too early, missed schedules, invisible failures, unsafe reruns, unresolved conflicts, skew, spill, shuffle, or missing logs.
Operational Dependency: Task graph, parameters, idempotence, bundle target variables, compute state, Delta layout, and diagnostic settings must be validated.
How the Exam Asks It: The exam asks for the first operational action that produces evidence, not the most familiar Databricks feature.
How Distractors Are Designed: Wrong answers tune data, grant broad permissions, optimize tables, or change code when the run/deploy/monitor object is still unverified.
Why the Correct Answer Works: The correct answer selects the owner of the run or deployment behavior and confirms it with run, bundle, profile, Delta, or Azure Monitor evidence.

Practice Question: A Spark job is slow after a join. The team sees one stage taking much longer than others and large shuffle spill. What evidence should be inspected before rewriting the pipeline?
A. Spark UI DAG, stage metrics, and query profile evidence for skew, shuffle, and spill.
B. Only the table comment in Catalog Explorer.
C. The Delta Sharing recipient list.
D. The Git branch name.
Correct Answer: A.
Explanation: A is correct because the symptom is runtime execution imbalance and shuffle spill. B is metadata. C is external sharing. D may identify code version but not the physical bottleneck. Exam Takeaway: Select the object that owns the dependency; the distractor pattern is an adjacent Databricks feature that is technically real but does not satisfy the scenario's first blocking condition.

Atomic Deconstruction - Operational Level

Component Specifications

Object	Attribute	Value Range	Default State	Dependency	Failure State
Cluster metrics	Cost and performance signal	CPU, memory, workers, DBU usage, duration	Collected during run	Monitoring configuration and cluster run	Overprovisioning or bottleneck remains hidden
Spark UI DAG	Execution plan evidence	Stages, tasks, shuffle, spill, skew	Visible for Spark application	Run history and Spark event data	Troubleshooting changes code without knowing bottleneck
Query profile	SQL performance evidence	Scan, join, aggregation, spill, duration	Generated for SQL queries	Warehouse or Spark SQL query execution	Wrong optimization is applied
Delta optimization	File-layout maintenance	OPTIMIZE, ZORDER where applicable, VACUUM retention	Not run unless scheduled or executed	Delta table and retention policy	Small files or stale files increase cost
Azure Monitor Log Analytics	Central log stream	Workspace diagnostic logs and queryable tables	Not configured until diagnostics enabled	Diagnostic settings and Log Analytics workspace	Alerts lack run or cluster evidence

Step-by-Step Execution Path

Start with the failed or slow run history to identify task, duration, retries, cluster, and error message.
For Lakeflow Jobs, decide whether repair, restart, stop, or rerun is appropriate based on checkpoint and task idempotence.
For Spark performance, inspect Spark UI DAG, stages, task distribution, shuffle read/write, spill, and skew indicators before changing code.
For SQL performance, inspect query profile to identify scan size, join strategy, aggregation cost, and data skipping behavior.
Capture the specific metric that justifies the fix: skewed task duration, high shuffle read/write, spill to disk, cache miss, scanned files, or long-running join/aggregate stage.
For Delta tables, use OPTIMIZE for file compaction and supported clustering/ZORDER patterns; use VACUUM only with retention awareness.
Stream diagnostic logs to Log Analytics and configure Azure Monitor alerts when operational detection is required beyond the Databricks UI.

Exam implementation pattern:

When to use: production workload is slow, expensive, failed, or invisible to operations.
Minimal syntax: inspect run history, Spark UI stages/DAG, query profile, Delta history, and Azure Monitor Log Analytics before changing code.
What to verify: skewed task duration, shuffle read/write, spill, scanned files, OPTIMIZE/VACUUM history, and alert firing evidence.
Common wrong answer: rewriting transformations before proving the bottleneck with Spark UI or query profile.

Technical Chain

Exam Trap Summary: Do not rewrite transformations before inspecting Spark UI, query profile, shuffle, spill, skew, and Delta history evidence.

Operational Skills Matrix

Task	Precise Command or Path	Verification Standard
Validate run failure signal	Lakeflow Jobs > target job > Runs > failed run details	Error, failed task, retry count, and repair eligibility are visible
Validate Spark bottleneck	Spark UI > Jobs/Stages/SQL tabs for the run	Skew, shuffle, spill, or resource bottleneck evidence matches the symptom
Validate Delta optimization	SQL verification: `DESCRIBE HISTORY <catalog>.<schema>.<table>;`	OPTIMIZE or VACUUM operations appear only when appropriate
Validate log streaming	Azure Monitor > Log Analytics workspace query for Databricks diagnostic categories	Recent workspace/job/cluster events are queryable and alerts can target them

Shopping cart

Subtotal:

DP-750 Deploy and maintain data pipelines and workloads

Detailed list of DP-750 knowledge points