Plan and Design

Plan and Design Detailed Explanation

1. Design Methodology

Design methodology is how you think as an architect.
If you get this thinking right, a lot of exam questions become easier.

1.1 Requirements, Constraints, Assumptions, Risks (RCAR)

These four words appear everywhere in design questions.
You must be able to recognize them in a scenario and use them in your decisions.

Requirements

Requirements are what the solution must deliver.

Business requirements
- Come from the business, not from IT
- Examples:
  - “The HR system must be available 99.9% of the time.”
  - “We must keep customer data for at least 7 years.”
  - “The system must support 2,000 concurrent users.”
Technical requirements
- Derived from business requirements, but more specific and technical
- Examples:
  - “The platform must support vSphere HA.”
  - “We must provide 4 TB of usable storage for the CRM database.”
  - “All management traffic must be separated from VM traffic.”
Regulatory / compliance requirements
- Came from laws or standards:
  - “Data must be encrypted at rest.”
  - “Data must not leave the country.”
Functional vs Non-functional
- Functional requirements → what the system does
  - “Users can log in and view their invoices.”
- Non-functional requirements (NFRs) → how well it does it
  - “Login response time must be < 2 seconds in 95% of cases.”
  - “The system must be available 99.99% of the time.”

In exam scenarios, requirements are non-negotiable: you must design to meet them.

Constraints

Constraints are limitations you must respect. You cannot change them (or only with great difficulty).

Examples:

Budget:
- “We cannot buy new storage hardware this year.”
Timeline:
- “The solution must be delivered within 3 months.”
Technology:
- “We must use vendor X storage because it’s already purchased.”
- “We must use vSphere 8 because of support policy.”
Organization:
- “Network team controls all VLAN configurations.”

Constraints may restrict your design choices. For example:

Requirement: “We want 99.99% availability.”
Constraint: “We only have one data center.”

→ You cannot design active-active multi-site; you must find the best within that limitation.

Assumptions

Assumptions are things you believe to be true, but they’re not confirmed.

Examples:

“We assume that the network team will provide 10 GbE uplinks.”
“We assume that all workloads can be migrated to vSphere without changes.”
“We assume that backups will be handled by the backup team with their existing solution.”

Good practice:

Document assumptions clearly
Validate them with stakeholders as early as possible
If an assumption turns out to be wrong, the design may need to change

In exams, sometimes they show an assumption and ask:

“Is this a requirement, a constraint, an assumption or a risk?”

Risks

Risks are potential bad things that might happen in the future and affect your solution.

Examples:

“There is a risk that the single storage array may fail, causing a major outage.”
“There is a risk that the network team cannot deliver 10 GbE on time.”
“There is a risk that the growth of data will be higher than expected.”

Each risk should have:

Probability → how likely it is (low/medium/high)
Impact → how bad it is if it happens (low/medium/high)
Response:
- Mitigate → reduce probability or impact
- Avoid → change design to remove the risk
- Transfer → e.g., insurance, support contracts
- Accept → consciously accept, usually documented and approved

In the exam, you may see a list of statements and must classify them as Requirement / Constraint / Assumption / Risk or choose which risks need mitigation.

1.2 Conceptual, Logical, Physical Design

Architects usually think in three “layers” of design.
You start high-level and get more detailed as you go.

Conceptual design

Conceptual design answers:

“What is the solution going to do, for whom, without any vendor-specific details?”

Characteristics:

High-level view
Focus on:
- Users
- Major systems
- Data flows
- Business capabilities

Examples of conceptual statements:

“Provide a highly available virtual infrastructure for production workloads.”
“Separate management workloads from business workloads.”
“Allow remote branch offices to run local workloads with central management.”

No mention of:

vSphere version
CPU models
Storage vendors
VLAN IDs

These details come later.

Logical design

Logical design answers:

“How will the solution be structured in terms of components and relationships, still without exact hardware or product SKUs?”

Characteristics:

More detailed than conceptual
Still technology-agnostic in terms of specific part numbers, but you can mention technologies (vSphere, vSAN, etc.)
Focus on:
- How many clusters
- Host roles
- Logical network layout
- Logical storage layout

Example logical statements:

“We will have 3 vSphere clusters: Management, Production, DMZ.”
“Each cluster will contain 6 ESXi hosts.”
“A vSAN datastore will be used for management workloads.”
“Separate port groups will be used for management, vMotion, and VM traffic.”

No mention of:

“Dell PowerEdge R750 with 2 × Intel Xeon XYZ”
“VLAN ID 1234 for vMotion”

Those are physical design details.

Physical design

Physical design answers:

“Exactly what will be implemented and how? Which hardware, which settings, which IDs?”

Characteristics:

Concrete and implementable
Includes vendor and model details
Contains configuration values

Examples:

“Use 6 × Dell PowerEdge R750 servers, each with 2 × 16-core CPUs and 512 GB RAM.”
“Use vSphere 8.0 U2 with vCenter Server Appliance.”
“Use VLAN 10 for management, VLAN 20 for vMotion, VLAN 30 for vSAN.”
“Create a vSAN datastore with RAID-1 FTT=1 for management VMs.”
“Use RAID 10 on the storage array with 8 × 1.92 TB SSDs.”

Conceptual → Logical → Physical in practice (simple example)

Imagine the requirement:

“We need a virtualized platform for production and test workloads, with high availability.”

A possible chain:

Conceptual:
- “Provide a virtual infrastructure that separates production and test workloads, with high availability for production.”
Logical:
- “Create two clusters: Production (HA enabled) and Test (HA optional).”
- “Production cluster will host critical line-of-business VMs; Test cluster will host development/test VMs.”
Physical:
- “Production cluster: 6 ESXi hosts, each with 2 × 12-core CPUs, 256 GB RAM.”
- “Test cluster: 3 ESXi hosts, each with 2 × 8-core CPUs, 128 GB RAM.”
- “vSAN all-flash for Production, NFS datastore for Test.”

In the exam, you may be asked to map items between these layers or identify what type a given statement belongs to.

2. Capacity Planning & Sizing

Capacity planning & sizing answers the question:

“How big does the environment need to be to support all workloads now and in the future?”

You don’t want to design something too small (performance problems) or too big (waste of money).

2.1 Workload Analysis

Before sizing, you must understand the workloads.

Collect existing workload metrics

Typical metrics:

CPU usage
- Average usage over time (MHz or %)
- Peak usage during busy hours
- Helps estimate vCPU requirements
Memory consumption
- Active memory, not just configured memory
- Some VMs allocate 16 GB but only use 4 GB; that matters
Storage IOPS and latency
- IOPS = how many input/output operations per second
- Latency = how long each operation takes
- Throughput (MB/s) for large sequential workloads
Network throughput and patterns
- Average and peak traffic
- East–west (between VMs) vs north–south (VM ↔ external)

How to collect these in real life (conceptually):

Performance statistics from existing hypervisors
Monitoring tools (e.g., Aria Operations, other monitoring systems)
OS-level tools (Perfmon, top, sar)

Group workloads

You rarely treat all workloads the same. You group them by:

Criticality
- Tier 1 (mission critical)
- Tier 2 (important)
- Tier 3 (non-critical)
Performance profile
- CPU-heavy (analytics, compute jobs)
- Memory-heavy (in-memory DBs)
- Storage-heavy (databases, file servers)
- Network-heavy (proxies, gateways)
Environment
- Production
- UAT
- Development / Test

Why group?

Different groups may go to different clusters with different SLAs, HA settings, storage policies, etc.

2.2 Host and Cluster Sizing

Once you understand workloads, you estimate how many hosts and how much resource per host.

Decide: number of hosts per cluster

Factors:

Total CPU and memory needed (from workload analysis)
Overcommit ratios (how much sharing you accept)
HA goals (N+1, N+2)
Maintenance requirements

Example idea (simplified):

Total CPU needed after growth: X GHz
Each host provides: Y GHz
You need H hosts such that:
- H × Y ≥ X even after you lose 1 or 2 hosts

N+1 / N+2 policies

N+1
- The cluster can lose 1 host and still run all workloads
N+2
- The cluster can lose 2 hosts

Example:

You decide you need 5 hosts worth of capacity to run all workloads.
For N+1, you design 6 hosts
For N+2, you design 7 hosts

Exam questions often give an SLA like:

“The solution must tolerate failure of 2 hosts without affecting performance.” → N+2

Maintenance overhead

Besides failure capacity, you must also support:

Patch windows
Hardware replacement
Upgrades

Often you design so that:

You can put one host in maintenance mode and still be N+1 capable.

2.3 Resource Allocation

This is about how you share resources in the cluster.

vCPU to pCPU ratio targets

vSphere allows CPU overcommit
Safe ratios depend on workload type
As a rough conceptual idea:
- Light workloads: high ratio (e.g., 6:1 or more)
- Heavy workloads: low ratio (e.g., 1:1 to 3:1)

For design questions, you don’t usually calculate exact ratios but should know:

Too much overcommit → CPU contention (high ready time)
Right-sizing VMs is critical

Memory overcommit thresholds

vSphere can overcommit memory using TPS, ballooning, compression, swapping
Swapping must be avoided in design because it kills performance
Designers often target utilization levels like “70% memory usage at steady state” to provide headroom

Storage capacity and policies

You size datastores based on:
- Raw capacity
- Usable capacity after RAID, FTT, overhead
- Growth expectations

For vSAN specifically:

Storage policies (e.g., FTT=1, RAID-1, RAID-5) increase required raw capacity
Example:
- If a VM needs 1 TB and policy is FTT=1 with RAID-1, you need roughly 2 TB raw capacity

Designers must always think:

“How much raw capacity do I need to meet usable capacity under given policies?”

3. Design of Availability & Resilience

This part answers:

“How do we keep the system running, even when things break?”

3.1 HA & FT Design

Configure HA clusters to meet SLAs

SLA might say:
- “99.9% availability for application X”
Your design uses:
- vSphere HA
- Sufficient host redundancy (N+1, N+2)
- Redundant networking
- Redundant storage paths

You choose HA settings (admission control, isolation response, etc.) that align with business requirements.

Admission control based on failures to tolerate and workload criticality

If the business says, “We must tolerate 1 host failure,” you pick a policy that keeps enough resources free for 1 host failover
If critical workloads are few, you may use:
- VM-level reservations or priority for those VMs
- Specific clusters just for critical workloads

Decide: FT vs HA vs application clustering

FT (Fault Tolerance):
- For highest criticality and small VMs
- Zero downtime, zero data loss for supported failure types
HA:
- For most workloads
- Good compromise: some downtime (reboot time), no manual action
Application-level clustering (e.g., MSCS/WSFC, Oracle RAC):
- Protects at the application layer
- Often better for complex apps, databases

Rule of thumb:

Use HA as the default
Use FT for small, ultra-critical services where reboot is not acceptable
Use app clustering when application vendors recommend or require it

3.2 DR & BC (Business Continuity)

Business continuity ≈ “How do we keep the business running during major failures?”
Disaster recovery (DR) is the technical part of that.

Define RPO and RTO

RPO (Recovery Point Objective):
- How much data can we lose during a disaster?
- Example: RPO = 15 minutes → replication at least every 15 minutes
RTO (Recovery Time Objective):
- How long can the service be down?
- Example: RTO = 4 hours → must bring the service back within 4 hours

These strongly influence:

DR technology choice
Replication frequency
Level of automation

Decide scope: what is protected?

Options:

Entire sites (all workloads)
Specific business-critical applications
Only some databases

Protecting everything is expensive. Often you prioritize Tier 1 workloads.

Technologies

vSphere Replication:
- Per-VM replication
- Asynchronous
Array-based replication:
- Storage system mirrors LUNs/volumes to another array
SRM (Site Recovery Manager):
- Orchestrates failover and failback
- Uses VR or array replication
- Automates:
  - Boot order
  - IP changes
  - Network mappings

Third-party tools may also be used, but for the exam, these three are key.

DR runbooks

Runbooks define step-by-step procedures:

Start order of VMs
Dependencies (DB first, then app servers, then web servers)
Network mappings (old IP vs new IP)
Manual steps if needed
Failback steps after the primary site is restored

In real life, DR that is not documented and tested usually fails.
In design exams, you should always think about documentation and testing as part of the solution.

4. Performance & Scalability Design

4.1 CPU & Memory Design

NUMA awareness

Modern servers have multiple NUMA nodes
Best practice:
- Keep vCPU and memory of a VM inside a single NUMA node when possible
Very large VMs that span NUMA nodes may have reduced performance

Design implications:

Do not oversize VMs unnecessarily
Understand the physical NUMA layout of hosts

Reservations, limits, shares

These are vSphere resource controls.

Reservation:
- Minimum guaranteed resources
- Pros: ensures critical VMs get what they need
- Cons: reserved resources cannot be used by others
Limit:
- Hard maximum cap
- Dangerous: if set too low, you artificially throttle a VM even when resources are available
Shares:
- Relative priority under contention
- High shares = VM gets larger portion of resources if contention occurs

Design guidelines:

Use reservations sparingly and usually only for key VMs
Avoid limits unless you have a strong reason
Use shares to express business priority (Tier 1 > Tier 2 > Tier 3)

4.2 Storage Performance

IOPS and latency requirements

Some apps (databases) need high IOPS and low latency
Others (archive servers) can live with low performance

You must match:

Workload profile ↔ storage tier (SSD, hybrid, HDD)

Design parameters

RAID levels:
- RAID 10 → good performance, lower usable capacity
- RAID 5/6 → more capacity, slower writes
Queue depths:
- At the HBA (host bus adapter) and storage array
- Too small → bottlenecks
- Too big → can overload the array
Multipathing policies:
- Round Robin
- Fixed
- Most Recently Used (MRU)

Choose policies based on storage vendor best practices.

4.3 Network Performance

MTU (Maximum Transmission Unit)

Standard MTU: 1500 bytes
Jumbo frames: 9000 bytes (commonly)

Larger MTU reduces CPU overhead and can improve throughput for:

vMotion traffic
vSAN traffic
Storage traffic (iSCSI, NFS)

But:

All devices along the path must support it
MTU mismatches can cause strange problems

NIC teaming policies

Examples:

Route based on originating port ID:
- Simple, common default
Route based on IP hash:
- Requires special switch configuration (port channel/LACP)
Route based on physical NIC load (LBT):
- Balances traffic based on NIC load

Design must respect:

Switch capabilities
Redundancy needs
Type of traffic

Traffic segmentation

Good practice: separate traffic types:

Management
vMotion
vSAN
Storage (iSCSI/NFS)
VM networks

Methods:

Different VLANs
Different port groups
Possibly different physical NICs or uplinks

This improves:

Performance
Security
Troubleshooting

5. Security & Compliance Design

Security is not an afterthought; it is part of design from the beginning.

5.1 Authentication & Authorization

vCenter SSO

SSO provides centralized authentication
Connects to identity sources like:
- Active Directory
- LDAP directories

This allows users to login to vCenter with domain accounts.

RBAC (Role-Based Access Control)

Define roles (sets of permissions)
Assign roles to groups or users at specific scopes

Principle:

Least privilege → users should have only the permissions needed to do their job

Example roles:

VM admin (can manage VMs, but not hosts)
Storage admin (can manage datastores, not VMs)
Read-only (can view, not change)

5.2 Hardening & Isolation

ESXi host lockdown modes

Restricts direct login to hosts
Forces use of vCenter for management
Reduces attack surface

Disable unnecessary services

SSH
ESXi shell
Any management service not actively needed

Fewer open ports → less risk.

Network isolation

Separate management networks
Separate vMotion, vSAN, storage traffic
Use firewalls and NSX micro-segmentation to enforce security policies

Micro-segmentation:

Apply firewall rules at VM level
Example:
- Web tier can talk to app tier
- App tier can talk to DB tier
- But web tier cannot talk directly to DB tier

This helps contain attacks and lateral movement.

5.3 Encryption & Data Protection

vSphere VM encryption

Encrypts VM files (VMDKs, etc.)
Protects data at rest
Needs a Key Management Server (KMS) integration

vSAN encryption

Encrypts data at the vSAN datastore level
Can be “at-rest” or “end-to-end” (depending on version)
Also uses KMS

Compliance requirements

Standards like PCI DSS, GDPR, HIPAA, etc., may require:

Encryption
Access control and audit logs
Data locality (where data is stored)
Retention and deletion policies

Designers must map these requirements to:

vSphere capabilities (encryption, RBAC)
Network controls (NSX, firewalls)
DR and backup designs

Plan and Design (Additional Content)

1. Requirement Prioritization and Design Trade-offs

1.1 Requirement Prioritization

In real enterprise architectures, not all requirements can be satisfied fully. Prioritization ensures the architecture focuses on what delivers the highest business value.

A commonly used method is the MoSCoW model:

Must
Mandatory requirements that must be satisfied for the solution to be accepted. Failure to meet a Must requirement means the design is invalid.
Should
Important requirements but not fundamental. If necessary, trade-offs can be made.
Could
Nice-to-have requirements that improve usability or convenience but are not essential.
Won’t
Out-of-scope items, intentionally excluded to avoid scope creep.

Architects use this method for structured decision-making and to guide discussions when trade-offs are required.

1.2 Handling Conflicting Requirements

Conflicts among requirements are common. Several examples include:

High availability versus limited budget
High performance versus limited hardware
Security versus operational simplicity
Scalability versus short timelines

Architects must evaluate relative priority, identify constraints, and choose the option that best aligns with business goals rather than purely technical preferences.

1.3 Trade-off Examples

Trade-offs appear throughout design decisions. Common examples include:

Choosing RAID-10 for performance versus RAID-5/6 for cost efficiency
Choosing all-flash vSAN for latency-sensitive workloads versus hybrid vSAN to reduce cost
Applying strict network segmentation to enhance security, even if it increases operational complexity

In VMware design exams, the correct answer usually reflects the best alignment with business priorities (requirements + constraints), not necessarily the technically “best” product or feature.

2. Design Decisions and Justifications

2.1 What Is a Design Decision?

A design decision defines how a specific architectural aspect is implemented. A well-formed design decision should include:

The option chosen
The alternatives considered
The justification for the choice
The impact (positive or negative) on requirements, cost, operations, and constraints

Strong justification is a key differentiator in VMware design exams.

2.2 Example Structure

A best-practice format:

Decision: Use vSAN all-flash as the primary storage for the production cluster.
Alternatives: External SAN, NFS.
Justification: Meets performance SLAs, offers policy-based management, integrates with lifecycle and automation tooling, and simplifies operations.
Impact: Higher initial cost, requires NVMe-based cache devices.

A complete design must show the architect has critically evaluated the options rather than choosing arbitrarily.

2.3 Exam Relevance

Design-oriented questions often ask:

Which option represents the best design decision?
Which justification is most aligned with the given requirement set?

Correct answers follow requirements, constraints, and operational realities—not personal preference.

3. Design Documentation and Validation

3.1 Design Documentation Components

A complete design document typically includes:

Executive summary
Detailed RCAR (requirements, constraints, assumptions, risks)
Conceptual, logical, and physical designs
Security and compliance considerations
Operational considerations (patching, monitoring, backup)
Migration/integration plans
Diagrams, IP plans, VLAN matrices, cluster architecture details

Good documentation increases maintainability and ensures clear communication across teams.

3.2 Validation Methods

Validation confirms that a design meets business and technical requirements. Methods may include:

Proof of Concept testing
Pilot deployments in controlled environments
Performance benchmarking
HA/DR failover testing
Stakeholder review and sign-off
User acceptance testing

Validation is mandatory because assumptions and theoretical sizing must be proven under real conditions.

3.3 Exam Context

Typical exam-style questions include:

Which activity validates that the design meets the performance requirement?
When a new risk is discovered, which design artifact must be updated?

Understanding documentation structure and validation steps is essential for correct answers.

4. Migration and Transition Planning

4.1 Migration Strategies

A good design includes a full migration plan to transition from the current-state environment to the future-state architecture.

Common strategies include:

Cold migration (VM powered off)
vMotion or Storage vMotion for live migrations
Rolling cluster upgrades or replacement
Swing host / swing cluster techniques
HCX-assisted migration, including live bulk movement across sites
SRM-assisted planned migration

Choosing the correct method depends on workload criticality, downtime tolerance, and network/storage compatibility.

4.2 Transition Phases

A structured migration plan typically includes:

Assessment and discovery
Pilot migration to validate tooling and cutover steps
Phased batch migrations
Final cutover window
Tested rollback plan
Post-migration validation and acceptance

Each step reduces operational risk and ensures predictable outcomes.

4.3 Risk Considerations

Typical risks include:

Application incompatibility with new hypervisor versions
Network/IP changes impacting communication
Cross-version or cross-storage compatibility issues
Storage format changes (VMFS version, vSAN policy differences)
Performance degradation during migration windows

Exam questions often ask which migration option best satisfies downtime, network, or compatibility constraints.

5. Day-2 Operations and Lifecycle Design

5.1 Operational Requirements

The design must reflect what is required to support and run the environment after implementation. Key operational areas include:

Monitoring and alerting policies
Log retention and analysis requirements
Capacity forecasting and reporting
Backup and recovery requirements
Patch and firmware management cycles
Vulnerability and configuration compliance cycles

Operational requirements influence cluster layout, networking choices, and lifecycle tooling.

5.2 Tools and Processes

Designers must plan for operational tooling, even if specific vendors are not named.

Categories include:

Performance monitoring and anomaly detection platforms
Centralized log aggregation or SIEM systems
Configuration or drift management tooling
Lifecycle patching/orchestration tools

The goal is operational consistency and reduced manual intervention.

5.3 Exam Scenarios

Common exam questions:

Which design supports the monitoring and alerting requirement?
Which operational constraint impacts the host-count or cluster-size decision?

Correct answers depend heavily on understanding operational maturity and organizational constraints.

6. Growth, Scalability, and Lifecycle Planning

6.1 Future Growth Projections

Designs must satisfy future-state growth, not only current-state requirements. Projections typically evaluate:

3- or 5-year capacity needs
Data center rack space, power, and cooling
Cluster scalability limits (hosts, VMs, memory)
vCenter and SSO domain scalability
Storage growth, IOPS growth, network bandwidth needs

Ignoring growth results in short-lived architectures.

6.2 Lifecycle Considerations

Architects must plan for the full lifecycle of hardware and software:

ESXi and vCenter upgrade paths
Deprecation/removal of older features (for example, SD cards for ESXi boot)
Hardware lifecycle and warranty cycles
VMware Compatibility Guide (VCG) dependencies
Interoperability between versions of vSphere, NSX, vSAN, and other components

This includes ensuring that all layers of the stack have valid upgrade routes over the next several years.

6.3 Lifecycle Risks

Common lifecycle risks include:

Hardware nearing end-of-support or end-of-life
CPU generations no longer supported by new ESXi versions
Firmware incompatibility with desired vSphere releases
vCenter and ESXi version mismatches in multi-site architectures
Legacy storage or network protocols approaching deprecation

A robust design identifies these risks early and incorporates mitigation strategies.

Shopping cart

Subtotal:

3V0-21.25 Plan and Design

Detailed list of 3V0-21.25 knowledge points

Plan and Design Detailed Explanation

1. Design Methodology

1.1 Requirements, Constraints, Assumptions, Risks (RCAR)

1.2 Conceptual, Logical, Physical Design

2. Capacity Planning & Sizing

2.1 Workload Analysis

2.2 Host and Cluster Sizing

2.3 Resource Allocation

3. Design of Availability & Resilience

3.1 HA & FT Design

3.2 DR & BC (Business Continuity)

4. Performance & Scalability Design

4.1 CPU & Memory Design

4.2 Storage Performance

4.3 Network Performance

5. Security & Compliance Design

5.1 Authentication & Authorization

5.2 Hardening & Isolation

5.3 Encryption & Data Protection

Plan and Design (Additional Content)

1. Requirement Prioritization and Design Trade-offs

1.1 Requirement Prioritization

1.2 Handling Conflicting Requirements

1.3 Trade-off Examples

2. Design Decisions and Justifications

2.1 What Is a Design Decision?

2.2 Example Structure

2.3 Exam Relevance

3. Design Documentation and Validation

3.1 Design Documentation Components

3.2 Validation Methods

3.3 Exam Context

4. Migration and Transition Planning

4.1 Migration Strategies

4.2 Transition Phases

4.3 Risk Considerations

5. Day-2 Operations and Lifecycle Design

5.1 Operational Requirements

5.2 Tools and Processes

5.3 Exam Scenarios

6. Growth, Scalability, and Lifecycle Planning

6.1 Future Growth Projections

6.2 Lifecycle Considerations

6.3 Lifecycle Risks

Frequently Asked Questions