VMware vSphere is the foundational virtualization platform used in modern data centers.
It consists mainly of two core components:
ESXi → the hypervisor running on physical servers
vCenter Server → the management server used to control ESXi hosts and clusters
ESXi is VMware’s enterprise-grade hypervisor.
As a beginner, you can think of ESXi as the “operating system of the data center”—it manages hardware resources and runs virtual machines.
This means:
ESXi is installed directly on the physical server’s hardware
No underlying OS like Windows or Linux
Very efficient and secure
Purpose-built for running virtual machines at high performance
If you log into an ESXi host directly, you see only a minimal console—most operations are done from vCenter.
When you install or manage an ESXi host, these areas matter the most:
vmk0 is the VMkernel interface that provides management access to the host.
Used by:
vCenter Server
SSH (if enabled)
Host Client UI
Must have:
Correct IP address
Correct gateway
Reachable DNS
Secure network placement
These three must always be configured correctly:
Hostname: human-readable name for the host
DNS:
Forward (hostname → IP)
Reverse (IP → hostname)
Critical for joining the host to vCenter
NTP: time synchronization
Time synchronization is especially important in VMware clusters.
ESXi can boot from:
Local disks (SATA, SAS, NVMe)
SD cards (legacy)
USB devices (legacy)
Auto Deploy (network PXE boot)
vSphere 7+ now discourages SD/USB due to reliability concerns
Most enterprise environments use:
RAID1 local disks
Or stateless deployment (Auto Deploy)
ESXi includes built-in hardening options:
Lockdown mode
SSH service
Firewall
Security hardening is essential for compliance frameworks and best practices.
vCenter Server is the central management platform for all ESXi hosts and clusters.
You don’t manage large environments host-by-host; instead, you use vCenter to manage:
Hosts
Resource pools
Clusters
VM templates
Storage
Networking
Users & permissions
A pre-packaged virtual machine running on Photon OS
Includes:
vCenter services
vSphere Web Client
vSphere SSO
vSphere Inventory Service
VMware recommends VCSA for all implementations
No Windows-based vCenter anymore (deprecated)
Manage all vSphere objects, including:
Data centers
Clusters
ESXi hosts
Resource pools
VM folders
Virtual machines
Storage and networking objects
vCenter provides the UI and cluster services used to configure:
HA (host failure recovery)
DRS (automated workload balancing)
EVC
Cluster resource settings
Although HA works even if vCenter fails temporarily, configuration still requires vCenter.
Templates:
Preconfigured “golden images” for quick VM deployment
Ensures consistency
Saves enormous time
Content Library:
Stores templates, ISOs, scripts
Can be shared across multiple vCenters
Useful for multi-site deployments
vCenter includes granular access control:
Built-in roles: Administrator, Read-Only, VM User, etc.
Custom roles: tailor-created for security needs
Permissions assigned at different scopes:
vCenter root
Cluster
Folder
Individual VM
RBAC is crucial for large teams and compliance.
SSO handles authentication.
You can integrate identity sources:
Active Directory
LDAP
Local SSO users
SSO also provides token-based authentication for vCenter components.
Clusters bring hosts together and unlock powerful features like HA, DRS, and vMotion.
vSphere HA is designed to protect against host failures.
If an ESXi host crashes:
HA detects failure
HA restarts VMs from the failed host onto other surviving hosts
No manual intervention needed
Note:
HA does not keep VMs running continuously (for that, see FT later). It only restores them after a failure.
HA elects one host as “Master”
Others are “Agent” hosts
The master monitors host heartbeats and VM states
Two heartbeat channels:
Management network heartbeat
Datastore heartbeat
This dual-heartbeat system prevents false failure detection.
Admission Control ensures that enough cluster resources remain free for failover.
Three policy types:
Slot policy (legacy)
Calculates slots based on worst-case VM CPU/RAM usage
Conservative, often wastes resources
Percentage-based policy (recommended)
Dedicated failover hosts
Watches VM heartbeat (VMware Tools)
If VMOS freezes → VM is automatically restarted
Helps recover from guest-level failures
When the host loses network connection but is still running:
Options:
Power off VMs (common when using vSAN)
Shut down VMs (gentler)
Leave VMs powered on (default in some environments)
Correct setting depends on storage connectivity and HA design.
DRS keeps workloads balanced across hosts.
Analyzes CPU and memory usage
Moves VMs between hosts using vMotion
Ensures fair resource allocation
This prevents “hotspots” where one host is overloaded.
Manual: DRS suggests moves, admin approves
Partially Automated:
Initial placement automated
Migration suggestions still require approval
Fully Automated (recommended):
Controls how aggressive DRS is:
Conservative → moves VMs only under high imbalance
Aggressive → moves VMs proactively to optimize balance
Higher thresholds = more frequent vMotions
Rules to keep VMs together or apart:
VM-VM affinity:
VM-VM anti-affinity:
Two VMs must NOT run on the same host
Useful for domain controllers, clustered apps
VM-Host rules:
Some VMs must stay (or avoid) certain hosts
Often used for licensing or hardware dependency
Maintenance Mode uses DRS to evacuate VMs from a host
Proactive DRS integrates with hardware monitoring systems
These features are foundational for cluster operations.
Live migration of running VMs with zero downtime
Moves VM’s memory + execution state from one host to another
Requirements:
Shared storage (except in Shared-Nothing vMotion)
Compatible CPUs (EVC can help)
vMotion network
Moves virtual disks of a running VM between datastores
No downtime
Used for:
Storage rebalancing
Migration from expensive/slow storage
Maintenance on arrays
Fault Tolerance provides true zero-downtime protection.
Creates a Secondary VM that mirrors the Primary VM
Both run in lockstep using synchronous replication
If primary host fails → secondary instantly takes over
Users see no reboot, no interruption, no data loss
vCPU limits:
Older versions: 1 vCPU only
Newer versions: up to 4 vCPUs (but varies by version)
High bandwidth requirement:
Uses more resources because it runs 2 copies of the VM
FT is used only for small but critical workloads.
VMware’s clustered file system
Built for SAN storage:
Fibre Channel
iSCSI
FCoE
Supports many ESXi hosts accessing the same datastore concurrently
Enables vMotion, HA, DRS
ESXi mounts NFS v3 or v4.1 shares as datastores
Simpler than SAN
Good for:
Less complex deployments
Environments with strong NAS infrastructure
Shared access without VMFS
vSAN is a modern hyperconverged storage solution.
Uses local disks from each ESXi host
SSD/NVMe for cache
HDD or SSD for capacity
Forms a distributed datastore
Very tightly integrated with vSphere
Managed via storage policies
Policies define:
FTT (Failures to Tolerate):
RAID1 mirroring
RAID5/6 erasure coding
Stripe width
Checksum
Deduplication/Compression (all-flash only)
Each VM can have its own policy — this is more flexible than VMFS/NFS.
Standard cluster (most common)
2-node ROBO (Remote Office Branch Office)
Light footprint
Requires a witness appliance
Stretched cluster
For site-level resilience
Two active sites + witness
Zero-RPO architecture depending on design
Exists only on a single ESXi host
No central management
Basic networking functionality
Good for small environments or isolated use cases
Supports:
VLAN tagging
NIC teaming
Basic failover
Enterprise-grade virtual switch.
Centralized management via vCenter
Consistent configuration across all hosts in the cluster
Required for advanced features:
Detect mismatched VLANs
MTU consistency
Teaming issues
vDS is recommended for production clusters.
NSX is VMware’s network virtualization & security platform.
Creates logical switches and routers
Uses VXLAN or GENEVE encapsulation
Allows VM networks independent of physical networks
Security applied at the VM NIC level
Enables micro-segmentation
Fine-grained security policies
Stops “east-west” attacks inside the data center
Advanced networking services
Software-defined, flexible, API-driven
NSX is a huge topic and key for modern VMware/cloud designs.
Performance monitoring
Capacity planning
Forecasting
Root-cause analysis
Anomaly detection
Very useful for design exams.
Automates deployment of VMs/services
Enables self-service portals
Enforces policies
Multi-cloud integration
Central log collection
Search, dashboards
Helps troubleshoot ESXi, vCenter, NSX, vSAN
DR orchestration
Automated failover and failback
Test DR without affecting production
Uses storage or vSphere Replication
SRM plays heavily in DR solution design.
VCHA is a feature that protects the vCenter Server service itself. It does not protect ESXi hosts or virtual machines directly, but it ensures that vCenter remains available when the underlying appliance or its OS fails.
Architecture components:
Active Node
Runs all vCenter services. This is the node you normally connect to with the vSphere Client.
Passive Node
Maintains a synchronized copy of the vCenter application and database. It does not serve client requests while Active is healthy, but it is ready to take over.
Witness Node
A lightweight node used for quorum. It decides which node should be Active in case of communication failures, preventing split-brain situations.
Key characteristics:
State and database are replicated synchronously from Active to Passive.
If the Active node fails, the Passive node is automatically promoted to Active.
Typical failover time is within a few minutes, depending on environment size and health.
VCHA protects only the vCenter service. ESXi hosts and cluster features like HA and DRS continue to function using the last known configuration, even if vCenter is down.
Design considerations:
VCHA requires separate networks for management and replication to avoid interference and to secure state replication.
All three nodes should not share the exact same failure domain (for example, avoid placing all of them on the same datastore or host).
VCHA is primarily useful where vCenter uptime is a strong requirement, for example environments with heavy automation or large operational teams depending on vCenter APIs.
Enhanced Linked Mode connects multiple vCenter Servers that share a Single Sign-On (SSO) domain.
Core capabilities:
Shared SSO domain, so users authenticate once and can access all linked vCenters.
Global inventory view, allowing you to see and manage objects across multiple vCenters from a single vSphere Client session.
Shared roles and permissions across vCenters in the same SSO domain.
Typical use cases:
Large-scale environments where one vCenter is not enough (host/VM scalability, geographic separation, or operational separation).
Multi-region deployments where each site has its own vCenter but operations teams need a unified view.
Segmented environments such as Production, DR, and Test vCenters that still require centralized identity and RBAC.
Important constraints:
All linked vCenters must be joined to the same SSO domain during deployment or reconfiguration; different SSO domains cannot be merged later.
Supported versions must be compatible; mixed major versions are not allowed.
ELM helps unify management of multi-site and hybrid cloud vCenter deployments, but each vCenter still has its own inventory and must be backed up and managed individually.
vSphere Replication is a hypervisor-based, per-VM replication technology.
Core properties:
Replication is configured per virtual machine (or per virtual disk), allowing fine-grained protection.
Asynchronous replication with a configurable Recovery Point Objective (RPO), typically from 5 minutes up to 24 hours.
Storage-agnostic: it works independently of the underlying array type (VMFS, NFS, vSAN, or heterogeneous storage).
Supported replication targets:
Same-site replication (for intra-site protection or local rollback).
Remote sites (typical two-site DR architectures).
Certain cloud-based DR services, such as VMware Cloud DR, depending on solution integration.
Design implications:
Because VR is per-VM and storage-agnostic, it is ideal where array-based replication is not available or where different storage vendors are used across sites.
RPO is bounded by available network bandwidth; insufficient bandwidth will prevent low RPO values from being met.
SRM is an orchestration tool for disaster recovery and planned migrations.
vSphere Replication in SRM:
VR can act as the replication engine behind SRM, replacing or complementing array-based replication.
SRM adds orchestration and automation on top of replication:
Recovery Plans defining which VMs to recover, in what order.
Startup sequencing, including delays and dependency groups.
Network mappings between source and recovery site networks.
Non-disruptive DR testing using test networks and bubble environments.
Design implications:
VR plus SRM is a powerful, relatively storage-agnostic DR solution.
Choice between VR and array-based replication depends on bandwidth, storage capabilities, and required RPO/RTO.
Key considerations and limitations:
VM snapshots are not replicated as snapshots; VR replicates the current disk state.
Replication traffic can be significant and is constrained by network bandwidth and latency.
Protecting a large number of VMs with short RPOs requires careful bandwidth planning, traffic shaping, and possibly dedicated replication network paths.
VR is not intended for zero-RPO use cases; those require synchronous replication or specialized solutions.
VMware Cloud Foundation is a full-stack Software-Defined Data Center (SDDC) platform that integrates:
vSphere for compute virtualization.
vSAN for software-defined storage.
NSX for network virtualization and security.
SDDC Manager for lifecycle and configuration automation.
The idea is to provide a standardized, validated SDDC architecture with automated deployment and lifecycle management across the core components.
VCF organizes resources into workload domains.
Domain types:
Management Domain
A dedicated domain hosting core management components such as:
vCenter for the management domain.
NSX managers for management and possibly shared services.
SDDC Manager itself.
This domain is created first and is a prerequisite.
VI Workload Domains
Additional domains created to host tenant or application workloads:
Each VI Workload Domain has its own vCenter and one or more vSphere clusters.
NSX can be deployed per domain as needed.
Domains can be sized and configured independently, following consistent blueprints.
Benefits of workload domains:
Isolation between different sets of workloads or tenants.
Independent lifecycle operations per domain (upgrade one domain at a time).
Clear boundaries for operations, security, and compliance.
VCF provides:
Full-stack lifecycle management:
Patch, upgrade, and configuration consistency managed across vSphere, vSAN, NSX, and associated firmware (where supported).
SDDC Manager orchestrates and validates updates.
Unified management across multiple clusters and domains:
Standardized deployment patterns reduce design and implementation time.
Easier to ensure consistency at scale.
Suitability for large enterprises and multi-site solutions:
Provides a foundation for hybrid cloud by aligning with VMware Cloud offerings.
Reduce configuration drift across sites and environments.
VMware HCX is a platform for application mobility in hybrid and multi-cloud scenarios.
Key capabilities:
Bulk Migration
Move many VMs in batches, often with limited or no downtime depending on migration type.
vMotion Migration
Live migration of running VMs across sites with no downtime, subject to network and latency requirements.
Replication-Assisted vMotion
Combines replication and vMotion to support migration of larger workloads with minimal downtime.
WAN optimization
Compression, deduplication, and traffic optimization for inter-site links.
Network extension (Layer 2 stretching)
Extends L2 networks across sites, allowing VMs to retain their IP addresses when moved.
Typical scenarios:
Data center migration:
Move workloads from an old physical site to a new one with minimal disruption.
Phase migrations over time rather than a single cutover.
On-premises to VMware Cloud on AWS:
Multi-cloud VM mobility:
Move workloads between different VMware-based clouds.
Keep a consistent operational model while placing workloads where they best fit.
HCX provides:
IP address preservation during migration, removing the need for large-scale re-IP of applications.
Near-zero or low-downtime migration methods for critical workloads.
Simplified large-scale migration orchestration and reduced complexity, especially when combined with extended L2 networks and WAN optimization.
NIOC provides a way to prioritize and control bandwidth allocation for different types of network traffic that share the same physical uplinks.
Traffic types commonly controlled:
vMotion
Management
vSAN
VM traffic
Fault Tolerance (FT)
iSCSI/NFS or other storage traffic
NIOC works on vSphere Distributed Switches (vDS) and enables the administrator to enforce policies when contention occurs.
For each predefined or custom traffic class, you can configure:
Shares
A relative priority value used during contention. Higher shares mean higher priority access to bandwidth when links are congested.
Bandwidth limits (optional)
Upper bounds on how much bandwidth a class may consume. This can prevent one traffic class from starving others even when there is no contention.
Design implications:
Critical traffic (management, vSAN, FT) should have higher shares than non-critical traffic (for example, backup or low-priority VM traffic).
Limits can be used carefully to prevent vMotion or backup jobs from overwhelming uplinks.
Scenario examples:
During mass vMotion events, NIOC prevents vMotion traffic from consuming all available bandwidth and impacting vSAN or management traffic.
Production VM traffic is configured with higher shares than test or development VM traffic, ensuring better performance when links are congested.
Management networks retain enough bandwidth to maintain host and vCenter connectivity, even during heavy data movement.
vLCM is a cluster-centric lifecycle management tool that uses an image-based model.
Core capabilities:
Define a single image per cluster specifying:
ESXi version.
Vendor addons and drivers.
Firmware versions for supported hardware (via vendor integration).
Apply that image to all hosts in the cluster and remediate them to match the defined state.
Benefits over baseline-based lifecycle:
Modern, declarative model rather than patch-based accumulation.
Easier to maintain consistent host configurations at scale.
Simplifies compliance with hardware compatibility and desired software stacks.
Design implications:
Clusters using vLCM images are easier to keep homogeneous and supported.
Tight integration with OEM vendors simplifies firmware and driver management.
Host Profiles automate consistent host configuration.
Core concepts:
Create a reference host with the desired configuration (networking, storage, security, services).
Extract a Host Profile from that host.
Apply the Host Profile to other hosts and remediate differences.
Use cases:
Auto Deploy environments, where ESXi hosts are stateless and must be configured at boot.
Large environments where manual host configuration is error-prone.
Environments with strict compliance requirements that require configuration drift detection.
Problems addressed:
Inconsistent manual configuration across hosts.
Configuration drift over time as changes are made without proper tracking.
Design considerations:
Certain host-specific values (such as NIC names, storage device identifiers, IP addresses) often need customization within the profile.
Periodic compliance checks should be built into operational processes.
VM encryption protects virtual machine files at rest.
Key characteristics:
Encrypts key VM files (such as VMDK and VMX) on datastores.
Uses an external Key Management Server (KMS) integrated via standard key management protocols.
Encryption and decryption are handled by vSphere, transparent to guest OS.
Design implications:
Requires KMS deployment and integration, with proper redundancy and lifecycle.
Backup products must be compatible with VM encryption.
Performance overhead is usually small but must be considered for high-I/O workloads.
vSAN encryption provides datastore-level encryption.
Key characteristics:
Encrypts data at the vSAN storage layer for the entire cluster or for selected policies, depending on mode.
Works independently of whether VMs are individually encrypted.
Also uses a KMS for key management.
Design implications:
Simplifies encryption of all objects stored on vSAN without requiring per-VM configuration.
Must be factored into design for performance (CPU overhead on disk operations) and for key rotation processes.
vTPM and Secure Boot enhance guest OS security.
vTPM
Adds a virtual TPM device to a VM:
Required for some modern OS features and disk encryption schemes.
Stores cryptographic measurements and keys inside the VM context.
VM Secure Boot
Ensures that only signed and trusted boot components inside the VM are loaded.
Design implications:
vTPM requires per-VM configuration and may rely on encryption to protect TPM state.
Essential for compliance with modern OS security baselines.
Must be aligned with backup and cloning procedures, as vTPM-equipped VMs can behave differently when moved or cloned.
These controls govern how CPU and memory are allocated under contention.
Shares
Represent relative priority when multiple VMs compete for the same resource:
Higher shares mean a VM or resource pool gets a larger portion under contention.
Shares are only considered when there is contention.
Reservations
Guarantee a minimum amount of resource:
For CPU, a guaranteed MHz amount.
For memory, a guaranteed amount of RAM.
Reserved capacity cannot be used by others, even if idle.
Limits
Cap the maximum resource that can be consumed:
Design considerations:
Avoid the “resource pool anti-pattern,” where resource pools are used only as folders while shares and reservations are misconfigured. This can cause unexpected starvation of child VMs.
For DRS, HA, and capacity planning, resource pools should be used intentionally:
Group workloads by priority or tenant.
Assign shares and, if necessary, reservations at the pool level to enforce service tiers.
Critical workloads may have reservations or higher shares to ensure they receive necessary resources during contention.
Limits should be used very carefully; in most cases, they are not needed and can cause more harm than good.
vVols is a storage architecture where the array exposes per-VM storage objects instead of large LUNs or datastores.
Core ideas:
Each VM’s disks are represented as individual objects on the array.
ESXi interacts with these objects through a VASA provider supplied by the storage vendor.
The traditional VMFS/NFS datastore abstraction is replaced with a more granular, policy-driven model.
Benefits of vVols:
Policy-based management at VM or VMDK level:
Native array-level snapshots and clones:
Reduced need to manage large shared LUNs or datastores:
Avoids the complexity of “datastore sprawl”.
Simplifies space and performance management.
Typical scenarios:
High-end storage arrays that offer advanced features (snapshots, replication, QoS) are best utilized through vVols.
Workloads requiring frequent, fast snapshots or clones, such as test/dev, CI/CD, or database development environments.
Environments requiring fine-grained storage policies per VM, rather than per-datastore.
Design implications:
Requires storage arrays and firmware that support vVols and VASA 2.x or higher.
Changes operational workflows: instead of managing datastores, administrators manage storage policies and rely on the array to enforce them.
Must be integrated with backup and DR strategies, as those tools need to understand vVols semantics.
Should Terraform replace Aria Automation in VMware Cloud Foundation environments?
No, Terraform should complement—not replace—Aria Automation in VCF.
Terraform excels at external IaC and multi-platform provisioning, while Aria Automation provides deep integration with VMware services, governance, and lifecycle management. In VCF, Aria Automation is tightly integrated with NSX, vCenter, and policy enforcement. Terraform can be used alongside it for infrastructure provisioning or integration with non-VMware platforms. Replacing Aria Automation entirely often leads to gaps in governance, approvals, and day-2 operations.
Demand Score: 85
Exam Relevance Score: 88
How does NSX integrate with Aria Automation for network provisioning?
NSX integrates with Aria Automation through native endpoints, enabling automated network and security provisioning within blueprints.
Aria Automation communicates with NSX APIs to create logical switches, routers, and security policies during deployment. This allows network configurations to be embedded directly into blueprints. A common mistake is manually configuring NSX components outside automation workflows, which breaks consistency. Proper integration ensures that network and compute resources are provisioned together, maintaining alignment and reducing configuration drift.
Demand Score: 80
Exam Relevance Score: 85
What is the difference between Aria Automation blueprints and Terraform templates?
Aria Automation blueprints are VMware-native declarative templates, while Terraform templates are provider-based IaC configurations.
Blueprints are tightly integrated with VMware infrastructure and support governance, approvals, and lifecycle actions. Terraform templates are more flexible across platforms but rely on providers and external state management. In VCF, blueprints offer deeper integration with NSX, storage policies, and day-2 actions. A frequent misunderstanding is assuming both are interchangeable—they serve overlapping but distinct roles.
Demand Score: 78
Exam Relevance Score: 84
What role does Aria Operations play in VCF automation?
Aria Operations provides monitoring, analytics, and optimization insights that inform automation decisions.
While not directly provisioning resources, Aria Operations feeds data into automation workflows, enabling actions such as rightsizing and capacity-based placement. It integrates with Aria Automation for policy-driven decisions. A common oversight is ignoring operational data in automation design, leading to inefficient resource usage.
Demand Score: 75
Exam Relevance Score: 80
How does vCenter interact with automation tools in VCF?
vCenter exposes APIs that automation tools use to provision and manage virtual infrastructure.
Automation tools such as Aria Automation and Terraform interact with vCenter to create VMs, manage clusters, and apply policies. vCenter acts as the execution layer for compute operations. A mistake is attempting to bypass vCenter and interact directly with ESXi hosts, which breaks centralized management and automation consistency.
Demand Score: 74
Exam Relevance Score: 82