IT Architectures, Technologies, Standards Detailed Explanation
1. Definition & mental model
When you design storage for VMware Cloud Foundation (VCF), you’re really choosing where data lives and how hosts talk to it. A useful mental model is:
- HCI (Hyper-Converged Infrastructure): compute and storage live in the same ESXi hosts (for VCF, think “vSAN-backed clusters”). Each host contributes disks; the cluster becomes the “array.”
- Traditional / external storage: compute lives on ESXi hosts, but storage is delivered by a separate SAN/NAS system (iSCSI/NFS/FC/NVMe-oF), exposed as datastores.
In exam terms, you’ll often be given a requirement (“need low operational overhead,” “need massive capacity,” “need strict separation of duties,” “need ultra-low latency”) and you must map it to the right architecture and storage protocol.
2. Key concepts & data flows
HCI vs Traditional (what moves where)
- In HCI, writes typically replicate across hosts (data + metadata), so east-west traffic inside the cluster matters a lot. You “scale” by adding hosts that bring both CPU/RAM and storage devices.
- In traditional storage, hosts send IO to an external target (array/controller). Your data path depends on protocol:
- Block (iSCSI, FC, NVMe-oF): ESXi sends block IO to a LUN/namespace.
- File (NFS): ESXi sends file operations to an NFS server/export.
Certificates / authentication / trust at a Base level (storage-focused)
- Storage access almost always has a “trust gate,” even if it’s not an X.509 certificate:
- iSCSI commonly uses initiator/target identity plus optional CHAP (credentials) to prevent “random hosts” from mounting LUNs.
- FC relies on fabric zoning and host identifiers (WWPNs) to control which initiators can see which targets/LUNs.
- NFS relies on export permissions and (in stronger configurations) identity-based controls; the key idea is still “who is allowed to mount what.”
- Why this matters: a misaligned identity/trust control shows up as “datastore can’t be mounted,” “paths are down,” or “only some hosts see the LUN.”
Basic sizing & placement decisions (first-pass)
- HCI sizing starts with “how many hosts do I need for compute” and then checks “do those hosts also provide enough capacity + performance + resiliency.”
- Traditional sizing often separates concerns: hosts are sized for compute; the array is sized for IO/capacity; the network/fabric is sized to carry the storage traffic reliably.
- Placement difference you’ll see in scenarios: HCI is simpler to place (inside the cluster), while traditional storage demands correct network/fabric design and consistent host configuration (VMkernel, HBA settings, multipathing).
3. Typical deployment and operations scenarios
When HCI (vSAN-style) is a good fit
- You want a consistent operational model: add a host → gain compute and storage together.
- You want simpler procurement/standardization (ReadyNode-style thinking).
- You need predictable cluster-level resilience behaviors (policy-based mirroring/erasure coding concepts show up later).
When traditional storage is a good fit
- You already operate a SAN/NAS platform and need to integrate it into VCF.
- You need very large capacity growth without adding ESXi compute.
- You have workloads that are better served by array features or centralized storage teams.
Choosing supported storage types (use-case mapping)
- NFS: simple to present and manage, easy datastore consumption; often chosen for operational simplicity in NAS environments.
- iSCSI: widely supported block storage over IP; common in midrange SAN or converged storage setups.
- Fibre Channel (FC): performance and isolation via dedicated fabric; common in enterprises with established FC practices.
- NVMe-oF: designed for high performance/low latency (depends on environment maturity and support).
4. Common mistakes, risks, and troubleshooting hints
- Confusing “architecture” with “protocol”: HCI vs traditional is the big picture; NFS/iSCSI/FC/NVMe-oF is the transport. You can’t “fix” an architectural mismatch by swapping protocols.
- Underestimating the network/fabric:
- HCI problems often look like “unexpected latency spikes” because replication traffic competes with other east-west flows if not designed cleanly.
- Traditional storage problems often look like “some hosts see the datastore, others don’t” due to zoning, CHAP/ACL mismatch, or multipathing misconfiguration.
- Inconsistent host configuration: storage is unforgiving—one host with a missing VMkernel config or wrong HBA zoning can break cluster-wide expectations (especially visible during maintenance or failover).
- Wrong mental model for scaling:
- HCI scaling = add hosts (and plan for rebalancing/repair traffic).
- Traditional scaling = add array capacity/perf (and ensure host pathing and fabric headroom).
5. Exam relevance & study checkpoints
What you should be able to do after this chapter:
- Explain (in your own words) the practical differences between HCI and traditional storage, including how scaling and failure domains feel operationally.
- Given a short requirement set, pick:
- the right architecture (HCI vs traditional), and
- a plausible storage type/protocol (NFS/iSCSI/FC/NVMe-oF) with a one-sentence justification.
- When given a symptom (“hosts can’t mount datastore,” “only half the cluster sees the LUN,” “performance is inconsistent”), name the most likely category of root cause (identity/trust controls, network/fabric, multipathing, host consistency).
6. Summary and suggested next steps
Storage questions become easier when you keep two layers separate:
- Architecture choice (HCI vs traditional) determines where storage lives and how you scale.
- Storage type/protocol (NFS/iSCSI/FC/NVMe-oF) determines how IO travels and what operational controls matter.
Next, we’ll move from “storage fundamentals” to “VCF storage specifics,” where these ideas connect directly to vSAN architectures, principal vs supplemental storage, and supported storage integration.
IT Architectures, Technologies, Standards (Additional Content)
Failure domains, blast radius, and “what breaks first”
Context & why it matters
Most exam stems are really asking: “If something fails, what stops working—and how widely?” HCI and traditional storage fail differently, so the fastest way to the right answer is to reason about blast radius and dependency chains.
Advanced explanation
- HCI (vSAN-style) failure domains are inside the cluster: device → host → cluster (and in multi-site designs, site).
- Typical first breakpoints: a host goes down, a device fails, or the storage network becomes inconsistent → objects/components become degraded → compliance/resync behavior kicks in.
- “Capacity pressure” is a special failure domain: once headroom is low, routine repairs/resyncs become slow or impossible, amplifying the impact of unrelated events.
- Traditional storage failure domains are usually outside the cluster: a fabric path, storage port/controller, LUN/export access control, or the array itself.
- Typical first breakpoints: paths go down, one host has different zoning/masking, or an export/CHAP change affects visibility → datastores become partially visible or inaccessible.
- Performance issues often concentrate around queueing or path failover behavior rather than “object health.”
Troubleshooting & decision patterns
- If you see cluster-wide compliance/degraded/resync language, you are in HCI-land: focus on component availability, network consistency, and repair pressure.
- If you see “some hosts can, some hosts can’t”, you are usually in traditional storage-land: focus on access controls and host configuration drift (zoning/masking/CHAP/export + pathing).
Exam relevance
A common trap is choosing the “fastest protocol” answer when the question is really about failure domain isolation and operational ownership.
Scaling, growth, and lifecycle: the hidden cost model
Context & why it matters
The exam often encodes this as a requirement like “rapid growth,” “no extra compute,” “limited maintenance windows,” or “separate storage team.” These map more strongly to scaling model than to protocol.
Advanced explanation
- HCI scaling couples compute and storage:
- Pros: predictable repeatable nodes, cluster-native operations, policy-driven placement.
- Cons: to add capacity, you usually add hosts or trigger significant data movement; repair/resync traffic becomes a planning factor as you scale.
- Traditional scaling decouples compute and storage:
- Pros: you can scale storage without touching ESXi compute; central array features and storage-team workflows often fit enterprises.
- Cons: more moving parts (fabric, pathing, array changes) and a bigger surface for “drift” across hosts.
Troubleshooting & decision patterns
- “Need more storage but compute is fine” → traditional is often the cleanest operational match.
- “Need a standardized platform with simple expansion and consistent ops across clusters” → HCI often matches better, assuming network/device readiness.
Exam relevance
When two options both “work,” the exam often prefers the option that minimizes lifecycle risk (drift, incompatible changes, or maintenance impacts).
Protocol decision matrix: NFS vs iSCSI vs FC vs NVMe-oF (signals, controls, and failure signatures)
Context & why it matters
“Supported storage types” questions rarely want trivia. They want you to recognize what each protocol implies about: (1) identity/access control, (2) host constructs, (3) common failure signatures.
Advanced explanation
- NFS (file)
- Access control: export permissions (and sometimes stronger identity modes).
- Common signature: mounts fail or become stale; symptoms can be “datastore inaccessible” with fewer moving path parts than SAN.
- iSCSI (block over IP)
- Access control: initiator/target identity + optional CHAP; discovery and session state matter.
- Common signature: targets not discovered, sessions down, or LUNs visible to only some hosts due to mismatched initiator config/CHAP/network.
- FC (block over fabric)
- Access control: zoning + LUN masking; WWPN identity is central.
- Common signature: a host sees no targets (zoning) or sees targets but no LUNs (masking), often “partial visibility” if zoning differs by host.
- NVMe-oF (block, performance-oriented)
- Access control and topology are environment-specific; operational maturity is the key requirement.
- Common signature: looks like a fabric/pathing issue first; performance expectations are high, so small misconfigs show up as “why isn’t it fast?”
Troubleshooting & decision patterns
- “Datastore missing on exactly one host” → start with identity/access control alignment (export/CHAP/zoning/masking), then host configuration consistency, then pathing.
- “Performance degraded after a link event” → suspect path failover and queueing; validate multipathing behavior and backend saturation.
Exam relevance
Protocol questions often include an option that is “technically valid” but operationally mismatched (for example, choosing a fabric-heavy protocol in a scenario that demands minimal specialized ops).
Requirement translation checklist (the fastest way to the right architecture answer)
Context & why it matters
This is the “do I pick HCI or traditional?” core, but now as a repeatable decision checklist.
Advanced explanation
Use a quick scoring approach (you don’t need numbers—just directionally consistent reasoning):
- Ops simplicity / standardization / fast rollout → lean HCI.
- Independent storage scaling / established storage team / centralized governance → lean traditional.
- Strict failure-domain isolation from compute → lean traditional.
- Need strong policy-driven behavior at the cluster level → lean HCI.
- Tight maintenance windows + low tolerance for data movement → often lean traditional (unless the HCI design explicitly budgets for repair pressure and resync windows).
Troubleshooting & decision patterns
If the stem includes “already have an array,” “storage team controls access,” or “no new compute,” don’t overthink it—those are strong traditional signals.
Exam relevance
This checklist helps you eliminate “almost right” answers that focus on protocol preference instead of architectural fit.