This domain focuses on observability, alerting, backup, disaster recovery, and ongoing maintenance of Azure environments.
Azure Monitor is a comprehensive platform for collecting, analyzing, and acting on telemetry data from Azure and non-Azure resources.
Numerical, time-series data.
Examples:
CPU usage
Disk IOPS
Network traffic
Used for:
Real-time monitoring
Triggering alerts
Detailed events and telemetry, collected from:
Azure resources
Virtual machines (via agent)
Applications
Stored in Log Analytics Workspaces
Logs support complex queries using KQL (Kusto Query Language).
Visual dashboards that combine:
Metrics
Logs
Text
Queries
Interactive and shareable for analysis or reporting
Capture resource logs (e.g., from VMs, Storage, App Services).
Allow routing to other systems.
Log Analytics Workspace: for queries and analysis
Event Hubs: for streaming logs to external SIEM tools (like Splunk)
Azure Storage: for archiving long-term
Example:
Send storage account logs to Log Analytics for querying
Send VM logs to Event Hub for off-Azure security analysis
Azure Monitor collects data from:
Azure resources: e.g., metrics from VMs, SQL DBs, storage
Guest OS metrics/logs: via Azure Monitor Agent (AMA)
Subscriptions and tenants: for audit logs, activity logs
Supports both platform and custom telemetry.
Understanding and working with metrics and logs is essential for diagnosing performance issues, identifying trends, and responding to incidents.
Metrics Explorer is a tool within Azure Monitor that allows you to:
Visualize metrics in real-time
Build charts (line, bar, etc.)
Set up alerts based on metrics
Go to a resource (e.g., VM, Storage Account)
Select Monitoring > Metrics
Choose:
Metric namespace (e.g., “Virtual Machine Host”)
Metric (e.g., CPU percentage)
Aggregation: avg, max, min, sum
Time range
(Optional) Apply filters and splitting by instance
Helps identify performance trends and anomalies
Kusto Query Language (KQL) is used to query and analyze logs collected by Azure Monitor.
Similar to SQL, but optimized for time-series and diagnostic data.
Perf
| where ObjectName == "Processor"
| where CounterName == "% Processor Time"
| where InstanceName == "_Total"
| summarize AvgCPU = avg(CounterValue) by bin(TimeGenerated, 5m)
This shows the average CPU usage every 5 minutes.
In the Log Analytics workspace
From Azure Monitor > Logs
From VM > Logs
You can collect your own application or system log files (e.g., logs from a web server) into Log Analytics.
Install Azure Monitor Agent
In the Log Analytics workspace, configure a Custom Log source
Define:
File path (e.g., C:\logs\myapp.log)
Delimiters or record patterns
Azure ingests the logs and lets you query them via KQL
Once collected, use KQL functions like parse, split, or extract() to analyze them.
Example:
CustomLog_CL
| extend Level = extract("Level=([A-Z]+)", 1, RawData)
| summarize count() by Level
Azure Monitor enables you to create alerts based on metrics, logs, or activity logs, and tie them to automated actions like notifications or remediation steps.
| Alert Type | Triggers on | Example |
|---|---|---|
| Metric Alert | Real-time performance data | CPU > 80% |
| Log Alert | KQL query result | Number of 500 errors > 10 |
| Activity Log Alert | Azure control-plane events | Resource deletion or creation |
Alerts can be scoped to:
A single resource
A resource group
An entire subscription
Thresholds: e.g., CPU > 80%
Frequency: How often the condition is evaluated
Evaluation period: Over what time range data is analyzed
Severity levels:
0 – Critical
1 – Error
2 – Warning
3 – Informational
4 – Verbose
Alerts are managed under:Azure Monitor > Alerts
An action group is a reusable set of response actions that are triggered when an alert fires.
You define:
Who to notify
What action to take
SMS
Push notification (via Azure app)
Voice call
Webhook: Notify a 3rd-party service
Azure Function: Run serverless code in response
Logic App: Trigger a workflow (e.g., post to Teams or Slack)
Go to Azure Monitor > Alerts > Action groups
Click + Create
Add:
Recipients
Actions
(Optional) Tags
When creating an alert rule, select this action group
Reuse the same action group across multiple alerts.
Azure provides built-in backup and restore services to protect your workloads and data. These tools support both Azure-native resources and on-premises systems.
A cloud-based backup solution that eliminates the need for on-prem backup infrastructure. It uses a component called the Recovery Services Vault.
Azure VMs: Full snapshot-based backups
On-premises servers:
Using MARS Agent (Microsoft Azure Recovery Services)
Or Azure Backup Server (for more advanced workloads)
SQL Server in Azure VMs: App-consistent backups using VSS
Create a Recovery Services Vault
Register the Azure VM
Define a backup policy
Initiate backup now or wait for scheduled backup
VMs can be restored to a new instance, or you can restore individual files.
Daily / Weekly / Monthly / Yearly backup points
Retention duration configurable (e.g., keep weekly backups for 12 weeks)
Helps organizations meet data retention requirements
Keeps up to 5 recovery points using snapshot-based backups.
Enables quick recovery without needing to fetch from vault storage.
For Azure VMs:
Mount the backup as a virtual drive
Browse and copy needed files
Options:
Create new VM
Restore disks only
Replace existing VM
Use when recovering from ransomware, accidental deletion, or configuration failure.
Azure Site Recovery (ASR) is a disaster recovery solution that replicates workloads to a secondary location, allowing business continuity in case of failures.
You can replicate an Azure VM from one region to another.
Requires:
Recovery Services Vault
Enabling replication settings per VM
Azure replicates OS disks and data disks asynchronously to the secondary region.
Use case: Protect production VMs from regional outages
ASR supports replication from:
Hyper-V (with or without System Center VMM)
VMware
Physical servers
Steps:
Install ASR agent on-prem
Set up Process Server and Configuration Server
Create replication policy
Enable replication to Azure
Azure becomes your disaster recovery site, reducing physical infrastructure needs.
Recovery plans allow you to orchestrate the failover process.
They support:
Grouping VMs by application tier
Sequencing startup
Custom scripts or manual steps
Useful for multi-VM applications where order of startup matters (e.g., DB → App → Web)
| Type | Description |
|---|---|
| Planned Failover | Controlled, no data loss, used for maintenance or migration |
| Unplanned Failover | Used during unexpected outages |
| Test Failover | Non-disruptive; validates DR setup in isolation |
All types support automated and manual testing via portal or PowerShell.
Once the primary site is restored:
You can replicate data back from Azure to on-prem or original region
Then failback systems safely
These tasks help ensure your Azure environment remains secure, up-to-date, and cost-efficient over time.
A solution in Azure Automation that helps:
Track missing updates
Schedule patch installation
Monitor compliance
Supports:
Windows VMs
Linux VMs
Helps maintain security compliance by ensuring systems are patched.
Enable Update Management from your VM or Automation Account
Link the VM to Log Analytics
Define schedule and maintenance window
Select:
Update types (critical, security)
Reboot options
Use Update Compliance reports to track:
Successful/failed patch attempts
Overall compliance percentage
Time of last scan
Provides personalized alerts and status notifications for:
Azure outages
Planned maintenance
Regional issues
Access via:Azure Portal > Service Health
Health Advisories: Best practices, changes to service behavior
Security Advisories: Threat detections or patches
Maintenance Notifications: Scheduled updates
Service Incidents: Real-time issue tracking
You can subscribe to email/SMS alerts for your services and regions.
Azure Advisor analyzes your environment and provides recommendations for:
| Category | Example |
|---|---|
| High Availability | Enable backup for critical VMs |
| Security | Enable MFA or NSG |
| Performance | Resize underutilized VMs |
| Cost | Remove idle resources or use Reserved Instances |
Each recommendation includes:
Potential impact
Estimated cost savings
Remediation steps
Resize VMs if:
Switch to different VM series based on workload
Upgrade or downgrade App Service Plans
Tune autoscale rules and diagnostic settings
Modify SQL Database DTUs/vCores or Service Tiers
Tune Blob storage access tiers or replication settings for cost-performance balance
Alert suppression allows you to control the frequency of alert notifications to avoid alert storms during high-frequency conditions.
Suppression Interval:
The minimum duration between successive notifications for the same alert condition.
Use case:
For example, if CPU stays above 90% for 2 hours, you may want only one notification per hour, not one per evaluation cycle.
In the alert rule creation wizard, under the Actions section.
Set a "Suppression" time window (e.g., 30 minutes, 1 hour).
Avoids alert fatigue
Helps teams focus on unique or meaningful alerts
Exam Tip: Understand how to reduce alert noise without disabling alerts entirely.
Azure Managed Grafana provides a fully managed Grafana environment integrated with Azure Monitor data sources.
Native integration with:
Azure Monitor
Log Analytics
Application Insights
Supports:
Custom dashboards
Role-based access control
Team collaboration
No need to maintain infrastructure or apply patches
Advanced, custom visualizations (beyond Workbooks)
Ideal for SRE and Ops teams
Supports mixed sources (e.g., Prometheus + Azure Monitor)
Search for Azure Managed Grafana in Azure Marketplace
Assign users with proper roles (e.g., Viewer, Editor)
Note: AZ-104 does not test Grafana directly, but awareness of monitoring extensibility can help in hybrid environments.
Kusto Query Language (KQL) supports powerful analytics features like joins, parsing, and data shaping.
Heartbeat
| where TimeGenerated > ago(1h)
| join kind=inner (
Perf
| where ObjectName == "Processor"
| where CounterName == "% Processor Time"
) on Computer
| project TimeGenerated, Computer, CounterName, CounterValue
Heartbeat shows VM availability.
Perf shows performance metrics.
The query joins both tables on the Computer name, allowing correlation of CPU usage with availability.
extend — Add calculated columns
parse_json — Extract structured data from JSON fields
summarize — Aggregate over time windows or categories
Exam Tip: AZ-104 focuses on basic KQL, but join examples may appear in practical or case-study questions.
Azure Resource Graph is a service for high-performance querying across large-scale Azure environments.
Query across subscriptions or management groups
Inventory reporting (e.g., list all VMs not in a backup vault)
Governance and compliance auditing
Resources
| where type == "microsoft.compute/virtualmachines"
| project name, location, resourceGroup
Instant, read-only access to resource metadata
Uses KQL-like syntax
Supports filtering by tags, properties, policies
Azure Portal → Resource Graph Explorer
Azure CLI: az graph query -q "<query>"
Note: Not directly tested in AZ-104, but helpful for real-world automation, inventory, and governance.
What is the difference between Azure Monitor and Log Analytics?
Azure Monitor is the overall monitoring platform, while Log Analytics is the tool used to query and analyze collected logs.
Azure Monitor collects metrics, logs, and telemetry from Azure resources and applications. Log Analytics is part of Azure Monitor and provides a query interface using Kusto Query Language (KQL) to analyze log data stored in a Log Analytics workspace. Administrators use Log Analytics to investigate performance issues, analyze system logs, and create alerts based on log queries.
Demand Score: 83
Exam Relevance Score: 92
Why might an Azure Monitor alert rule fail to trigger?
The alert condition may not match the metric threshold or evaluation period.
Azure Monitor alerts depend on defined thresholds, evaluation frequency, and aggregation logic. If the metric value does not exceed the configured threshold during the evaluation window, the alert will not fire. Administrators must verify that the correct metric, scope, aggregation method, and evaluation period are configured. Misconfigured thresholds or incorrect resource scopes are common causes of alerts not triggering.
Demand Score: 82
Exam Relevance Score: 90
What Azure feature is used to create automated notifications when resource metrics exceed thresholds?
Azure Monitor Alerts with Action Groups.
Azure Monitor Alerts evaluate metrics or log queries and trigger notifications or automated responses when defined conditions are met. Action Groups define what happens when the alert fires, such as sending email, SMS, webhook calls, or triggering automation runbooks. This mechanism allows organizations to quickly respond to performance or availability issues.
Demand Score: 79
Exam Relevance Score: 91
What service provides backup protection for Azure virtual machines?
Azure Backup.
Azure Backup is a managed service that protects Azure resources by creating recovery points stored in Recovery Services vaults. It supports automated backup schedules and retention policies. In case of data loss or system failure, administrators can restore virtual machines or individual files from the backup. This service simplifies backup management without requiring on-premises backup infrastructure.
Demand Score: 77
Exam Relevance Score: 90