When you design any IT solution, you must first understand:
Business requirements – what the company cares about in business terms (money, customers, regulations, growth).
Technical requirements – what the IT systems must do to support those business goals (performance, availability, security, etc.).
Both are equally important. If you only think about technology and ignore business, you may build a “cool” solution that nobody actually needs or can afford.
Business requirements answer:
“Why are we doing this project at all?”
Examples:
Revenue:
Cost reduction:
Customer experience:
Time-to-market:
As an HPE solution designer, you always ask:
Which of these are the top 2–3 objectives?
Your design must support those, even if it means trade-offs in other areas.
Not all applications are equally important.
Critical business processes are the ones that, if stopped, the business seriously suffers. For example:
An online banking system.
A hospital’s patient record system.
An e-commerce checkout system.
You should:
Identify which systems are mission-critical, which are important, and which are nice-to-have.
Give higher availability, performance, and protection to the critical ones.
Example:
Payroll system (pays employees) vs. an internal reporting system:
Many industries must follow laws and standards. Common examples:
GDPR – European data protection law (privacy, data subject rights).
HIPAA – Healthcare data protection in the US.
PCI-DSS – Security standard for credit card data.
Data residency – Data must physically stay in certain regions/countries.
Why this matters for HPE solutions:
You may need encryption at rest and encryption in transit.
You may need strong access control (RBAC) so only authorized people can access certain data.
You may need to design where data is stored (e.g., in a specific country or data center).
As a beginner, remember:
Compliance can force you to choose certain architectures (e.g., on-prem instead of public cloud, or certain security features enabled).
IT solutions must support not just today’s needs but also future growth.
You discuss with the customer:
“How many users do you have now?”
“How many do you expect in 3–5 years?”
“Will you open new branches or go into new countries?”
“Will new applications (like analytics, AI) be added soon?”
Typical impact:
You may design extra capacity (CPU, memory, storage) from day one.
Or design something that is easy to scale later (add more nodes, more disks, etc.).
If you ignore growth:
Even the best design is useless if the customer cannot afford it.
Key ideas:
CAPEX (Capital Expenditure) – big upfront purchase of hardware and software.
OPEX (Operational Expenditure) – pay-as-you-go or subscription (for example, HPE GreenLake models).
Questions to clarify:
Does the customer prefer buying hardware outright (CAPEX)?
Or do they prefer service-based models (OPEX), to pay monthly/annually?
What is the budget limit for this project?
What is their usual refresh cycle (e.g., replacing hardware every 3–5 years)?
As a solution designer, you may:
Propose a simpler architecture that fits the budget.
Or propose a GreenLake-like model that spreads costs over time.
Once you know the business direction, you translate it into technical requirements.
Think of this as answering:
“What must the IT system actually do, and how well must it do it?”
Different application types have very different patterns of CPU, memory, and storage usage.
Transactional (databases, ERP)
Many small, random I/O operations.
Very sensitive to latency (they need quick responses).
Example: An order entry system where each click reads/writes small pieces of data.
Virtualization (VMware, Hyper-V, KVM)
Many virtual machines (VMs) share the same physical servers and storage.
Workload mix: web servers, app servers, databases, etc.
Need good consolidation ratio (many VMs per host) and reliable storage.
VDI (Virtual Desktop Infrastructure)
Lots of desktops running as VMs.
Users log in in the morning → big login storm (many I/O operations at once).
Requires high performance early in the day; may need SSD tiers.
Big data/analytics
May do large sequential reads/writes over huge datasets.
Require high throughput (MB/s or GB/s) rather than ultra-low latency.
Backup
Typically large sequential writes (writing backup data to storage).
Heavy use during backup windows (e.g., night).
File services
File shares for users or applications (SMB, NFS).
Access patterns can be mixed: small random I/O, some sequential for media files.
Containers
Many microservices, often short-lived workloads.
Need fast provisioning and often shared storage or persistent volumes.
Why do we care?
The type of application drives decisions like: SSD vs HDD, bandwidth, latency targets, CPU and memory sizing, etc.
Performance requirements tell you how fast the system needs to respond.
Key metrics:
Latency
Latency is the time delay for a single operation.
Example: For a database, you might want storage latency below 1 millisecond (sub-ms) so queries feel fast.
If latency is too high, users experience “slowness”.
Throughput (MB/s or GB/s)
Throughput is the amount of data per second that can be transferred.
Important for workloads like backups, streaming, big data.
Example: “We need to back up 10 TB of data within a 4-hour window.”
IOPS (Input/Output Operations Per Second)
IOPS measures how many read/write operations per second the storage can handle.
OLTP databases may need very high IOPS.
You also consider:
Read/write ratio – e.g., 70% reads, 30% writes.
Block size – e.g., 4 KB vs 64 KB; affects performance design.
As a beginner, remember:
Latency → “how fast each operation is”.
IOPS → “how many operations per second”.
Throughput → “how much data per second”.
Availability is about how often the system is up and running.
SLA uptime (e.g., 99.9%, 99.99%)
SLA = Service Level Agreement.
99.9% uptime ≈ about 8.8 hours of downtime per year.
99.99% uptime ≈ about 52 minutes of downtime per year.
Higher availability usually means:
More redundant components (extra nodes, controllers, power supplies, paths, sites).
More complex design and higher cost.
Maintenance windows and failover requirements
Maintenance window: planned time when you can perform upgrades or changes.
Some businesses allow night-time maintenance; some (e.g., 24x7 banks) allow almost none.
Failover:
If a server or storage controller fails, what happens?
Do we have automatic failover to another node with no noticeable outage?
Availability requirements directly influence:
Cluster design (e.g., N+1 nodes).
Storage redundancy (controllers, RAID, etc.).
Network redundancy (multiple switches, multiple NICs).
Security requirements answer:
“How do we protect the data and control who can access what?”
Main elements:
Encryption
At rest: data stored on disks is encrypted.
In transit: data traveling over the network is encrypted (e.g., TLS).
Purpose: if someone steals disks or intercepts network traffic, they cannot read the data.
RBAC (Role-Based Access Control)
Users are given roles, such as:
Storage admin
Server admin
Read-only auditor
Each role has specific permissions.
Principle of least privilege: users only get the access they actually need.
Multi-tenancy
One physical infrastructure serves multiple groups or customers.
Example: a service provider hosting several companies’ workloads.
Need strong isolation between tenants:
Network segmentation
Different types of traffic are separated into different networks or VLANs:
Management network
Storage network
User/application network
Backup network
This improves security (harder for an attacker to move around) and performance.
All of this influences the design of:
Network layout
Storage access configuration
Management access and authentication
Now that we know what the business and technical requirements are, we need to measure the current environment and plan for capacity.
This is like weighing and measuring the patient before choosing the correct “dose” of compute and storage.
Workload profiling means:
“Let’s observe what the system is doing now, in real life.”
You usually use monitoring tools to collect data over some time (days or weeks).
These are your starting numbers:
CPU utilization
Average and peak CPU usage on servers (e.g., 30% average, 80% peak).
Helps you estimate how many and what size CPUs you need.
Memory usage
How much RAM is used on average and peak.
If servers are constantly at 90–100% RAM, you know you need more memory in the new design.
IOPS
Current storage operations per second.
Helps you size the new storage array or tier configuration.
Latency
Measures how quickly storage responds.
If current latency is high and users complain, new design must improve it.
Network throughput
How much data per second flows on key network links.
Helps you decide 1GbE vs 10GbE vs 25GbE, etc.
A system might look fine on average but still be overloaded at peak times.
You identify:
When do peaks happen? (e.g., mornings, month-end, Black Friday).
How high are the peaks compared to the average? (peak factor).
Example:
Average CPU: 25%
Peak CPU: 80%
If you size only for 25%, your system may collapse during peaks.
For critical systems, you design for peaks, not just averages.
Storage performance depends strongly on the ratio of reads vs writes.
Read-heavy workloads (e.g., reporting) behave differently from write-heavy (e.g., logging).
RAID levels behave differently for reads vs writes (e.g., RAID 5 is slower for random writes).
Knowing the read/write ratio helps you:
Choose appropriate RAID levels or erasure coding.
Decide on SSD vs HDD distribution.
Different workloads have different I/O patterns:
Random I/O:
Accessing small blocks scattered across the disk (like OLTP databases).
Needs fast response and often benefits from SSDs.
Sequential I/O:
Reading/writing large continuous chunks (like backups or video streaming).
They like high throughput; HDDs might be acceptable if bandwidth is sufficient.
Your profiling should identify which workloads are random vs sequential to design the storage tiering correctly.
The working set is the amount of data that is actively used during a period.
Example:
Why this matters:
Maybe you don’t need all 5 TB on SSD.
You can put the 500 GB hot data on fast SSD tier and the rest on cheaper, slower disks.
Capacity planning is about answering:
“How much compute, storage, and network capacity do we need now and in the future?”
We look at compute, storage, and network separately.
CPU sizing
Count current and future total vCPUs or application needs.
Consider cores, threads, and GHz of available processors.
For virtualization:
Account for headroom so that if one host fails, the others can take over (N+1 design).
Memory sizing
Add up memory used by all workloads, plus overhead.
Consider:
Per VM memory (e.g., each VM needs 8 GB).
Per node memory capacity (e.g., 512 GB per physical host).
NUMA awareness:
Modern CPUs are split into NUMA nodes; it’s more efficient if a VM’s memory stays inside one NUMA node.
You avoid making a single VM so large that it spans multiple NUMA nodes unnecessarily.
Usable vs raw capacity
Raw capacity = sum of all disk sizes.
Usable capacity = what is left after RAID/erasure coding overhead and system reserves.
Example:
10 disks × 2 TB = 20 TB raw.
With RAID and overhead, maybe only ~14 TB usable.
You must always calculate usable capacity, not just raw.
Thin vs thick provisioning
Thick provisioning:
Thin provisioning:
You allocate “logical” capacity but only consume physical space when data is written.
More flexible but requires careful monitoring to avoid running out of space.
Growth projections
Use historical growth (e.g., 20–30% per year) to estimate future capacity needs.
You might design:
Enough capacity for 3 years of growth.
Or design an easy expansion path (add disk shelves, add nodes) when needed.
You must ensure the network is not a bottleneck.
Required bandwidth for front-end access
This is the traffic between users/applications and the servers.
If hundreds/thousands of users access the system, you may need higher-speed links and possibly load balancers.
Back-end storage networks (FC/iSCSI), replication links, backup traffic
Storage access may use dedicated networks (Fibre Channel, iSCSI).
Replication between sites can generate large amounts of traffic.
Backups also generate heavy network load during backup windows.
You determine:
How many and what speed links you need (e.g., multiple 10GbE ports, 32G FC, etc.).
Whether some traffic needs separate VLANs or even separate physical networks for reliability and performance.
Even if you know the ideal design, real-world limitations will influence what you can actually do.
Constraints are things you cannot change easily.
Physical constraints
Power:
Cooling:
Rack space:
Floor loading:
These factors limit how many and what type of devices you can install.
Legacy dependencies
Old servers, storage arrays, or network devices that must stay in use.
Maybe a critical application only runs on a legacy OS on old hardware.
Your new design must integrate with or migrate away from these systems carefully.
Software licensing
Hypervisor or database licenses may be based on cores or sockets.
If you add more cores, licensing costs might rise sharply.
Sometimes it’s cheaper to buy fewer, larger servers, or vice versa, depending on the licensing.
Operational maturity
Skill level of the operations team:
Are they comfortable with advanced automation?
Or do they mostly work with basic GUIs and manual processes?
Existing tools and processes:
You shouldn’t design something the team cannot realistically operate.
Risk analysis is about asking:
“What could go wrong, and how do we reduce the impact?”
Single points of failure (SPOF)
A component whose failure will bring down the service.
Single switch.
Single storage controller.
Single site.
Goal: design so that failure of one component does not stop the service.
Vendor lock-in risks
If the solution depends too heavily on a specific proprietary feature, it may be hard to switch vendors later.
Sometimes this is acceptable; sometimes customers want more flexibility.
Migration risk
Data migration from old systems can be:
Time-consuming
Risky (data corruption or downtime if done poorly)
You must:
Estimate how long migration takes.
Plan downtime or live migration options.
Create a rollback plan in case something goes wrong.
Security risk
Data leakage (accidental or malicious).
Ransomware (encrypting data and demanding payment).
Misconfiguration (e.g., open shares, weak passwords).
Your design should include:
Proper access controls.
Regular backups and DR plans.
Security best practices for management access and patching.
Even as a technical person, you must be able to talk about costs and value.
TCO = all costs over the life of the solution, not just purchase price.
Acquisition costs
Hardware (servers, storage, switches).
Software (OS, hypervisor, backup software, management tools).
Professional services (consulting, installation, migration).
Operating costs
Power and cooling.
Support contracts and warranties.
Staff time: how many admins are needed to run the environment.
Refresh cycles
Hardware typically has a useful life (e.g., 3–5 years).
After that, you usually replace or upgrade.
Your design should consider these cycles; for example, avoid a design that will be obsolete quickly.
Support models
24x7 support vs business hours only.
Next Business Day (NBD) or 4-hour response.
Higher support levels cost more but reduce downtime risks.
ROI is about how much benefit the business gets from the money spent.
Performance improvements and productivity
Faster applications → employees work faster → more output.
Better user experience → more customer satisfaction and sales.
Consolidation benefits
Fewer systems to manage.
Lower power, cooling, and support costs.
Simplified management and less admin time.
Reduced outages and incidents
Downtime can be extremely expensive (lost revenue, damaged reputation).
A more reliable design can save money by preventing outages.
As a solution designer, you should be able to explain:
“Yes, this design costs more, but here’s how it saves or earns money over time.”
Even the best technical work fails if badly documented or poorly communicated.
Current state documentation
You should create a clear picture of the existing environment:
Asset inventory
List of servers, storage systems, network devices.
Model, serial number, firmware/OS version, location, role.
Network diagrams & data flow diagrams
Show how systems are connected.
Show which paths data takes (e.g., user → web server → app server → database → storage).
Application dependency mapping
Which applications depend on which databases, services, and systems.
Helps you understand impact if something fails or needs migration.
Requirements summary
Functional requirements – what the system must do (e.g., support 500 VMs, provide file shares, etc.).
Non-functional requirements – performance, availability, security, compliance, growth.
Constraints and assumptions – clearly written, so everyone knows the boundaries.
Good documentation avoids misunderstandings and helps during implementation and operations.
Stakeholder interviews
Talk to:
Business owners (care about business outcomes).
IT operations (care about manageability).
Security team (care about compliance and threats).
Networking team (care about network impact).
You must listen and translate their needs into technical design.
Validation workshops
Present your understanding of requirements and proposed high-level design.
Ask: “Is this correct? Did we miss anything important?”
Adjust the design based on feedback.
Sign-off
A formal agreement on:
Scope (what is included, what is not).
SLAs (availability, performance, support).
Success criteria (how we will know the project is successful).
Sign-off protects both you and the customer: it sets clear expectations before you start building.
In any assessment phase, one of the first tasks is to translate business expectations into measurable technical requirements. Three key metrics define how resilient and recoverable the environment must be.
RPO defines the maximum acceptable amount of data loss measured in time.
Examples:
RPO = 15 minutes
RPO = 1 hour
RPO = 0 seconds (requires synchronous replication)
Its meaning:
If a disaster occurs, how far back in time can the restored system be compared to the moment of failure?
Architectural impact:
Low RPO often requires frequent snapshots or replication.
RPO approaching zero requires synchronous replication and low-latency links.
Larger RPOs allow simpler and cheaper backup-based strategies.
RTO defines the maximum acceptable service outage duration after a failure.
Examples:
RTO = 5 minutes (requires automated failover)
RTO = 1 hour
RTO = 24 hours (backup restore acceptable)
Architectural impact:
Low RTO drives the need for HA clusters, stretched clusters, or orchestrated failover.
Higher RTO tolerates manual recovery, cold standby sites, or rebuild-and-restore methods.
Backup Window defines the time period during which backup operations are allowed to run without impacting production workloads.
Typical considerations:
Off-peak hours are often selected to avoid performance degradation.
Short backup windows may require incremental-forever approaches or storage-integrated snapshots.
Long backup windows allow slower or capacity-oriented methods (tape, cloud uploads).
These metrics directly influence:
Replication style: synchronous vs asynchronous
Snapshot schedules
Storage system selection and tiering
Multi-site design choices
Required network bandwidth and latency
Licensing and data protection strategies
They act as the foundation for all subsequent design and sizing decisions.
A complete assessment requires a detailed understanding of the existing infrastructure and workloads. This step ensures the new design addresses real conditions, not assumptions.
Typical collected attributes:
vCPU count, pCPU utilization patterns
Memory allocation and consumption
Operating system build and version
Application roles and business criticality
VM density and overcommit ratios
Growth trends
Purpose:
Determine which workloads can be consolidated, scaled, or require special hardware considerations.
Include:
Current capacity usage
IOPS and throughput patterns
Latency distribution
Tier usage (SSD vs HDD)
Snapshot and replication footprint
RAID/erasure coding overhead
Hot volumes or contention areas
Purpose:
Identify bottlenecks, predict required performance, and select correct data services.
Key elements:
Physical and logical diagrams
VLAN structure and segmentation
Link speeds, utilization, and oversubscription
East-west vs north-south traffic balance
Inter-site bandwidth and latency
Firewall and routing constraints
Purpose:
Ensure the network can sustain storage traffic, vMotion/Live Migration, replication, and backup flows.
A workload rarely stands alone. Identifying dependencies is crucial.
Examples:
Web tier depends on application tier
Application tier depends on database
Database depends on storage latency
Authentication depends on AD
Reporting depends on shared services
Purpose:
Avoid isolating or moving a workload without understanding interdependencies that affect latency or availability.
During discovery, you should highlight:
Overloaded hosts or datastores
Underutilized hardware
Misconfigured networks
Orphaned VMs or unused LUNs
Legacy systems needing replacement
Licensing constraints
Purpose:
The assessment is not only about documenting the environment; it is also about identifying improvement areas before design begins.
Modern environments require a strategy for placing each workload in the most suitable location.
Typically kept local when:
There are strict data residency or sovereignty requirements
Latency-sensitive applications require direct hardware access
Workloads are highly integrated with other on-prem systems
The business prefers CAPEX models
Regulatory or industry constraints restrict external hosting
Candidates for cloud placement often include:
Bursty or elastic workloads
Test and development systems
Stateless services
Web front-ends with global user bases
Workloads with predictable OPEX models
A combination of on-prem and cloud workloads is common when:
Some systems require low latency local resources
Others benefit from cloud elasticity and scale
DR or backup can benefit from cloud storage
Applications can span environments using APIs or service layers
Workload placement must consider:
Data sovereignty and legal compliance
Latency tolerance
Inter-system dependencies
Cost model preference (CAPEX vs OPEX)
Application architecture (monolithic vs distributed)
Required network egress and bandwidth patterns
Security posture and risk profile
Purpose:
Ensure that each workload executes where it performs best while meeting compliance and cost objectives.
HPE provides tools and methodologies that support accurate planning and sizing. These are usually applied during the assessment stage to validate assumptions and build a defensible architecture.
Used to:
Predict CPU, memory, and storage needs
Validate performance requirements
Model hardware configurations
Compare different architecture scenarios
Primarily used for:
Collecting performance telemetry from HPE systems
Detecting anomalies and predicting failures
Providing predictive analytics for sizing and optimization
Offering recommendations based on global fleet intelligence
These documents provide:
Validated configurations for common workloads
Prescriptive design recommendations
Supported limits and constraints
Best practices for SAN, networking, compute, and storage
Used to:
Capture existing infrastructure configuration
Export server profiles, firmware baselines, and network settings
Validate consistency across nodes
Assist migration or redesign planning
Purpose:
Standardize the assessment process and ensure design decisions are aligned with HPE-validated practices.
How do you estimate storage capacity requirements when planning an HPE SAN solution for a virtualized environment?
Estimate capacity by analyzing current storage usage, expected growth, performance requirements, and redundancy overhead.
Capacity planning begins with collecting workload data such as used storage, growth trends, and IOPS requirements. For virtualized environments (e.g., VMware clusters), architects typically include additional overhead for snapshots, RAID protection, replication copies, and VM growth. Best practice is to maintain 20–30% free capacity to ensure performance stability and accommodate future expansion. Administrators also review workload patterns such as random vs sequential IO to choose appropriate storage tiers. Ignoring growth trends or snapshot overhead often leads to under-sized storage systems and early capacity exhaustion. Effective planning balances raw capacity, usable capacity after RAID, and operational headroom to maintain performance and scalability.
Demand Score: 66
Exam Relevance Score: 78
What factors should be evaluated during the assessment phase before deploying an HPE storage solution?
Key factors include workload characteristics, performance requirements, capacity growth, connectivity requirements, and availability needs.
During the assessment phase, architects gather information about the environment to design the correct storage architecture. This includes measuring IOPS, latency expectations, and throughput demands of workloads such as databases, virtual machines, or analytics systems. Capacity growth projections help determine scalability requirements. Connectivity choices such as Fibre Channel, iSCSI, or NVMe-based fabrics also affect design decisions. Additionally, availability requirements determine RAID levels, controller redundancy, and replication needs. Neglecting workload profiling often results in misconfigured storage systems that either over-provision expensive resources or fail to meet performance requirements.
Demand Score: 61
Exam Relevance Score: 76
Why is workload analysis important before designing an HPE storage architecture?
Workload analysis ensures the storage system is sized and configured to meet performance and capacity demands.
Different workloads generate very different IO patterns. Databases produce random reads and writes, while backup workloads are mostly sequential. Virtual environments create mixed IO patterns and unpredictable spikes. By analyzing workload characteristics such as read/write ratio, block size, and peak IOPS, architects can determine appropriate RAID levels, disk types (SSD vs hybrid), and caching strategies. Without workload analysis, systems may experience latency issues, inefficient disk utilization, or excessive controller load. Proper assessment enables the selection of the right storage platform and configuration to meet service-level requirements.
Demand Score: 59
Exam Relevance Score: 72