Assessment and Planning

Assessment and Planning Detailed Explanation

1. Business and Technical Requirements

When you design any IT solution, you must first understand:

Business requirements – what the company cares about in business terms (money, customers, regulations, growth).
Technical requirements – what the IT systems must do to support those business goals (performance, availability, security, etc.).

Both are equally important. If you only think about technology and ignore business, you may build a “cool” solution that nobody actually needs or can afford.

1.1 Business requirements

Business requirements answer:

“Why are we doing this project at all?”

Business objectives

Examples:

Revenue:
- “We want to support an online sales platform that can handle more traffic so we can sell more products.”
Cost reduction:
- “We want to consolidate 50 old servers into 10 modern ones to reduce power, cooling, and maintenance costs.”
Customer experience:
- “Our website must load quickly, and our internal systems must respond fast so staff can serve customers efficiently.”
Time-to-market:
- “We need to roll out new applications faster. Our current environment is too slow to provision new servers or storage.”

As an HPE solution designer, you always ask:

Which of these are the top 2–3 objectives?
Your design must support those, even if it means trade-offs in other areas.

Critical business processes

Not all applications are equally important.

Critical business processes are the ones that, if stopped, the business seriously suffers. For example:
- An online banking system.
- A hospital’s patient record system.
- An e-commerce checkout system.

You should:

Identify which systems are mission-critical, which are important, and which are nice-to-have.
Give higher availability, performance, and protection to the critical ones.

Example:

Payroll system (pays employees) vs. an internal reporting system:
- Payroll probably needs higher priority, stronger backup, and better availability.

Regulatory & compliance needs

Many industries must follow laws and standards. Common examples:

GDPR – European data protection law (privacy, data subject rights).
HIPAA – Healthcare data protection in the US.
PCI-DSS – Security standard for credit card data.
Data residency – Data must physically stay in certain regions/countries.

Why this matters for HPE solutions:

You may need encryption at rest and encryption in transit.
You may need strong access control (RBAC) so only authorized people can access certain data.
You may need to design where data is stored (e.g., in a specific country or data center).

As a beginner, remember:

Compliance can force you to choose certain architectures (e.g., on-prem instead of public cloud, or certain security features enabled).

Growth plans

IT solutions must support not just today’s needs but also future growth.

You discuss with the customer:

“How many users do you have now?”
“How many do you expect in 3–5 years?”
“Will you open new branches or go into new countries?”
“Will new applications (like analytics, AI) be added soon?”

Typical impact:

You may design extra capacity (CPU, memory, storage) from day one.
Or design something that is easy to scale later (add more nodes, more disks, etc.).

If you ignore growth:

The system may work for 1 year and then become slow or full, and you’ll have to redesign.

Budget and procurement constraints

Even the best design is useless if the customer cannot afford it.

Key ideas:

CAPEX (Capital Expenditure) – big upfront purchase of hardware and software.
OPEX (Operational Expenditure) – pay-as-you-go or subscription (for example, HPE GreenLake models).

Questions to clarify:

Does the customer prefer buying hardware outright (CAPEX)?
Or do they prefer service-based models (OPEX), to pay monthly/annually?
What is the budget limit for this project?
What is their usual refresh cycle (e.g., replacing hardware every 3–5 years)?

As a solution designer, you may:

Propose a simpler architecture that fits the budget.
Or propose a GreenLake-like model that spreads costs over time.

1.2 Technical requirements

Once you know the business direction, you translate it into technical requirements.

Think of this as answering:

“What must the IT system actually do, and how well must it do it?”

Application types

Different application types have very different patterns of CPU, memory, and storage usage.

Transactional (databases, ERP)
- Many small, random I/O operations.
- Very sensitive to latency (they need quick responses).
- Example: An order entry system where each click reads/writes small pieces of data.
Virtualization (VMware, Hyper-V, KVM)
- Many virtual machines (VMs) share the same physical servers and storage.
- Workload mix: web servers, app servers, databases, etc.
- Need good consolidation ratio (many VMs per host) and reliable storage.
VDI (Virtual Desktop Infrastructure)
- Lots of desktops running as VMs.
- Users log in in the morning → big login storm (many I/O operations at once).
- Requires high performance early in the day; may need SSD tiers.
Big data/analytics
- May do large sequential reads/writes over huge datasets.
- Require high throughput (MB/s or GB/s) rather than ultra-low latency.
Backup
- Typically large sequential writes (writing backup data to storage).
- Heavy use during backup windows (e.g., night).
File services
- File shares for users or applications (SMB, NFS).
- Access patterns can be mixed: small random I/O, some sequential for media files.
Containers
- Many microservices, often short-lived workloads.
- Need fast provisioning and often shared storage or persistent volumes.

Why do we care?

The type of application drives decisions like: SSD vs HDD, bandwidth, latency targets, CPU and memory sizing, etc.

Performance targets

Performance requirements tell you how fast the system needs to respond.

Key metrics:

Latency
- Latency is the time delay for a single operation.
- Example: For a database, you might want storage latency below 1 millisecond (sub-ms) so queries feel fast.
- If latency is too high, users experience “slowness”.
Throughput (MB/s or GB/s)
- Throughput is the amount of data per second that can be transferred.
- Important for workloads like backups, streaming, big data.
- Example: “We need to back up 10 TB of data within a 4-hour window.”
  - That implies a certain minimum MB/s you must achieve.
IOPS (Input/Output Operations Per Second)
- IOPS measures how many read/write operations per second the storage can handle.
- OLTP databases may need very high IOPS.
- You also consider:
  - Read/write ratio – e.g., 70% reads, 30% writes.
  - Block size – e.g., 4 KB vs 64 KB; affects performance design.

As a beginner, remember:

Latency → “how fast each operation is”.
IOPS → “how many operations per second”.
Throughput → “how much data per second”.

Availability

Availability is about how often the system is up and running.

SLA uptime (e.g., 99.9%, 99.99%)
- SLA = Service Level Agreement.
- 99.9% uptime ≈ about 8.8 hours of downtime per year.
- 99.99% uptime ≈ about 52 minutes of downtime per year.

Higher availability usually means:

More redundant components (extra nodes, controllers, power supplies, paths, sites).
More complex design and higher cost.

Maintenance windows and failover requirements
- Maintenance window: planned time when you can perform upgrades or changes.
- Some businesses allow night-time maintenance; some (e.g., 24x7 banks) allow almost none.
- Failover:
  - If a server or storage controller fails, what happens?
  - Do we have automatic failover to another node with no noticeable outage?

Availability requirements directly influence:

Cluster design (e.g., N+1 nodes).
Storage redundancy (controllers, RAID, etc.).
Network redundancy (multiple switches, multiple NICs).

Security

Security requirements answer:

“How do we protect the data and control who can access what?”

Main elements:

Encryption
- At rest: data stored on disks is encrypted.
- In transit: data traveling over the network is encrypted (e.g., TLS).
- Purpose: if someone steals disks or intercepts network traffic, they cannot read the data.
RBAC (Role-Based Access Control)
- Users are given roles, such as:
  - Storage admin
  - Server admin
  - Read-only auditor
- Each role has specific permissions.
- Principle of least privilege: users only get the access they actually need.
Multi-tenancy
- One physical infrastructure serves multiple groups or customers.
- Example: a service provider hosting several companies’ workloads.
- Need strong isolation between tenants:
  - Separate volumes, separate VLANs, separate policies.
Network segmentation
- Different types of traffic are separated into different networks or VLANs:
  - Management network
  - Storage network
  - User/application network
  - Backup network
- This improves security (harder for an attacker to move around) and performance.

All of this influences the design of:

Network layout
Storage access configuration
Management access and authentication

2. Workload and Capacity Assessment

Now that we know what the business and technical requirements are, we need to measure the current environment and plan for capacity.

This is like weighing and measuring the patient before choosing the correct “dose” of compute and storage.

2.1 Workload profiling

Workload profiling means:

“Let’s observe what the system is doing now, in real life.”

You usually use monitoring tools to collect data over some time (days or weeks).

Baseline measurements

These are your starting numbers:

CPU utilization
- Average and peak CPU usage on servers (e.g., 30% average, 80% peak).
- Helps you estimate how many and what size CPUs you need.
Memory usage
- How much RAM is used on average and peak.
- If servers are constantly at 90–100% RAM, you know you need more memory in the new design.
IOPS
- Current storage operations per second.
- Helps you size the new storage array or tier configuration.
Latency
- Measures how quickly storage responds.
- If current latency is high and users complain, new design must improve it.
Network throughput
- How much data per second flows on key network links.
- Helps you decide 1GbE vs 10GbE vs 25GbE, etc.

Peak vs average

A system might look fine on average but still be overloaded at peak times.

You identify:
- When do peaks happen? (e.g., mornings, month-end, Black Friday).
- How high are the peaks compared to the average? (peak factor).

Example:

Average CPU: 25%
Peak CPU: 80%
If you size only for 25%, your system may collapse during peaks.

For critical systems, you design for peaks, not just averages.

Read/write ratio

Storage performance depends strongly on the ratio of reads vs writes.

Read-heavy workloads (e.g., reporting) behave differently from write-heavy (e.g., logging).
RAID levels behave differently for reads vs writes (e.g., RAID 5 is slower for random writes).

Knowing the read/write ratio helps you:

Choose appropriate RAID levels or erasure coding.
Decide on SSD vs HDD distribution.

Random vs sequential I/O

Different workloads have different I/O patterns:

Random I/O:
- Accessing small blocks scattered across the disk (like OLTP databases).
- Needs fast response and often benefits from SSDs.
Sequential I/O:
- Reading/writing large continuous chunks (like backups or video streaming).
- They like high throughput; HDDs might be acceptable if bandwidth is sufficient.

Your profiling should identify which workloads are random vs sequential to design the storage tiering correctly.

Working set sizes

The working set is the amount of data that is actively used during a period.

Example:

A database might be 5 TB in total, but only 500 GB is “hot” (frequently accessed).

Why this matters:

Maybe you don’t need all 5 TB on SSD.
You can put the 500 GB hot data on fast SSD tier and the rest on cheaper, slower disks.

2.2 Capacity planning

Capacity planning is about answering:

“How much compute, storage, and network capacity do we need now and in the future?”

We look at compute, storage, and network separately.

Compute capacity

CPU sizing

Count current and future total vCPUs or application needs.
Consider cores, threads, and GHz of available processors.
For virtualization:
- You might overcommit CPU (e.g., 4 vCPUs per physical core) depending on workloads.
Account for headroom so that if one host fails, the others can take over (N+1 design).

Memory sizing

Add up memory used by all workloads, plus overhead.
Consider:
- Per VM memory (e.g., each VM needs 8 GB).
- Per node memory capacity (e.g., 512 GB per physical host).
- NUMA awareness:
  - Modern CPUs are split into NUMA nodes; it’s more efficient if a VM’s memory stays inside one NUMA node.
  - You avoid making a single VM so large that it spans multiple NUMA nodes unnecessarily.

Storage capacity

Usable vs raw capacity

Raw capacity = sum of all disk sizes.
Usable capacity = what is left after RAID/erasure coding overhead and system reserves.

Example:

10 disks × 2 TB = 20 TB raw.
With RAID and overhead, maybe only ~14 TB usable.

You must always calculate usable capacity, not just raw.

Thin vs thick provisioning

Thick provisioning:
- You allocate the full capacity upfront (e.g., 1 TB volume actually reserves 1 TB on disk).
Thin provisioning:
- You allocate “logical” capacity but only consume physical space when data is written.
- More flexible but requires careful monitoring to avoid running out of space.

Growth projections

Use historical growth (e.g., 20–30% per year) to estimate future capacity needs.
You might design:
- Enough capacity for 3 years of growth.
- Or design an easy expansion path (add disk shelves, add nodes) when needed.

Network capacity

You must ensure the network is not a bottleneck.

Required bandwidth for front-end access

This is the traffic between users/applications and the servers.
If hundreds/thousands of users access the system, you may need higher-speed links and possibly load balancers.

Back-end storage networks (FC/iSCSI), replication links, backup traffic

Storage access may use dedicated networks (Fibre Channel, iSCSI).
Replication between sites can generate large amounts of traffic.
Backups also generate heavy network load during backup windows.

You determine:

How many and what speed links you need (e.g., multiple 10GbE ports, 32G FC, etc.).
Whether some traffic needs separate VLANs or even separate physical networks for reliability and performance.

3. Risk, Constraints, and Dependencies

Even if you know the ideal design, real-world limitations will influence what you can actually do.

3.1 Constraints

Constraints are things you cannot change easily.

Physical constraints

Power:
- The data center may have limited power capacity per rack.
Cooling:
- Too many high-power servers may overheat if cooling is insufficient.
Rack space:
- Only a certain number of rack units (U) are available.
Floor loading:
- Some data centers have limits on weight per square meter.

These factors limit how many and what type of devices you can install.

Legacy dependencies

Old servers, storage arrays, or network devices that must stay in use.
Maybe a critical application only runs on a legacy OS on old hardware.
Your new design must integrate with or migrate away from these systems carefully.

Software licensing

Hypervisor or database licenses may be based on cores or sockets.
If you add more cores, licensing costs might rise sharply.
Sometimes it’s cheaper to buy fewer, larger servers, or vice versa, depending on the licensing.

Operational maturity

Skill level of the operations team:
- Are they comfortable with advanced automation?
- Or do they mostly work with basic GUIs and manual processes?
Existing tools and processes:
- Backup software, monitoring tools, ticketing systems.

You shouldn’t design something the team cannot realistically operate.

3.2 Risk analysis

Risk analysis is about asking:

“What could go wrong, and how do we reduce the impact?”

Single points of failure (SPOF)

A component whose failure will bring down the service.
- Single switch.
- Single storage controller.
- Single site.
Goal: design so that failure of one component does not stop the service.

Vendor lock-in risks

If the solution depends too heavily on a specific proprietary feature, it may be hard to switch vendors later.
Sometimes this is acceptable; sometimes customers want more flexibility.

Migration risk

Data migration from old systems can be:
- Time-consuming
- Risky (data corruption or downtime if done poorly)
You must:
- Estimate how long migration takes.
- Plan downtime or live migration options.
- Create a rollback plan in case something goes wrong.

Security risk

Data leakage (accidental or malicious).
Ransomware (encrypting data and demanding payment).
Misconfiguration (e.g., open shares, weak passwords).

Your design should include:

Proper access controls.
Regular backups and DR plans.
Security best practices for management access and patching.

4. TCO and ROI Considerations

Even as a technical person, you must be able to talk about costs and value.

4.1 Total Cost of Ownership (TCO)

TCO = all costs over the life of the solution, not just purchase price.

Acquisition costs

Hardware (servers, storage, switches).
Software (OS, hypervisor, backup software, management tools).
Professional services (consulting, installation, migration).

Operating costs

Power and cooling.
Support contracts and warranties.
Staff time: how many admins are needed to run the environment.

Refresh cycles

Hardware typically has a useful life (e.g., 3–5 years).
After that, you usually replace or upgrade.
Your design should consider these cycles; for example, avoid a design that will be obsolete quickly.

Support models

24x7 support vs business hours only.
Next Business Day (NBD) or 4-hour response.
Higher support levels cost more but reduce downtime risks.

4.2 Return on investment (ROI)

ROI is about how much benefit the business gets from the money spent.

Performance improvements and productivity

Faster applications → employees work faster → more output.
Better user experience → more customer satisfaction and sales.

Consolidation benefits

Fewer systems to manage.
Lower power, cooling, and support costs.
Simplified management and less admin time.

Reduced outages and incidents

Downtime can be extremely expensive (lost revenue, damaged reputation).
A more reliable design can save money by preventing outages.

As a solution designer, you should be able to explain:

“Yes, this design costs more, but here’s how it saves or earns money over time.”

5. Documentation and Stakeholder Communication

Even the best technical work fails if badly documented or poorly communicated.

5.1 Documentation

Current state documentation
You should create a clear picture of the existing environment:

Asset inventory
- List of servers, storage systems, network devices.
- Model, serial number, firmware/OS version, location, role.
Network diagrams & data flow diagrams
- Show how systems are connected.
- Show which paths data takes (e.g., user → web server → app server → database → storage).
Application dependency mapping
- Which applications depend on which databases, services, and systems.
- Helps you understand impact if something fails or needs migration.

Requirements summary

Functional requirements – what the system must do (e.g., support 500 VMs, provide file shares, etc.).
Non-functional requirements – performance, availability, security, compliance, growth.
Constraints and assumptions – clearly written, so everyone knows the boundaries.

Good documentation avoids misunderstandings and helps during implementation and operations.

5.2 Communication

Stakeholder interviews

Talk to:
- Business owners (care about business outcomes).
- IT operations (care about manageability).
- Security team (care about compliance and threats).
- Networking team (care about network impact).

You must listen and translate their needs into technical design.

Validation workshops

Present your understanding of requirements and proposed high-level design.
Ask: “Is this correct? Did we miss anything important?”
Adjust the design based on feedback.

Sign-off

A formal agreement on:
- Scope (what is included, what is not).
- SLAs (availability, performance, support).
- Success criteria (how we will know the project is successful).

Sign-off protects both you and the customer: it sets clear expectations before you start building.

Assessment and Planning (Additional Content)

1. Clear SLA and Recovery Objectives (RPO / RTO / Backup Window)

In any assessment phase, one of the first tasks is to translate business expectations into measurable technical requirements. Three key metrics define how resilient and recoverable the environment must be.

Recovery Point Objective (RPO)

RPO defines the maximum acceptable amount of data loss measured in time.

Examples:

RPO = 15 minutes
RPO = 1 hour
RPO = 0 seconds (requires synchronous replication)

Its meaning:
If a disaster occurs, how far back in time can the restored system be compared to the moment of failure?

Architectural impact:

Low RPO often requires frequent snapshots or replication.
RPO approaching zero requires synchronous replication and low-latency links.
Larger RPOs allow simpler and cheaper backup-based strategies.

Recovery Time Objective (RTO)

RTO defines the maximum acceptable service outage duration after a failure.

Examples:

RTO = 5 minutes (requires automated failover)
RTO = 1 hour
RTO = 24 hours (backup restore acceptable)

Architectural impact:

Low RTO drives the need for HA clusters, stretched clusters, or orchestrated failover.
Higher RTO tolerates manual recovery, cold standby sites, or rebuild-and-restore methods.

Backup Window

Backup Window defines the time period during which backup operations are allowed to run without impacting production workloads.

Typical considerations:

Off-peak hours are often selected to avoid performance degradation.
Short backup windows may require incremental-forever approaches or storage-integrated snapshots.
Long backup windows allow slower or capacity-oriented methods (tape, cloud uploads).

Why these metrics matter

These metrics directly influence:

Replication style: synchronous vs asynchronous
Snapshot schedules
Storage system selection and tiering
Multi-site design choices
Required network bandwidth and latency
Licensing and data protection strategies

They act as the foundation for all subsequent design and sizing decisions.

2. Current Environment Discovery and Assessment

A complete assessment requires a detailed understanding of the existing infrastructure and workloads. This step ensures the new design addresses real conditions, not assumptions.

Server and VM Inventory

Typical collected attributes:

vCPU count, pCPU utilization patterns
Memory allocation and consumption
Operating system build and version
Application roles and business criticality
VM density and overcommit ratios
Growth trends

Purpose:
Determine which workloads can be consolidated, scaled, or require special hardware considerations.

Storage System Utilization and Performance Data

Include:

Current capacity usage
IOPS and throughput patterns
Latency distribution
Tier usage (SSD vs HDD)
Snapshot and replication footprint
RAID/erasure coding overhead
Hot volumes or contention areas

Purpose:
Identify bottlenecks, predict required performance, and select correct data services.

Network Topology and Link Analysis

Key elements:

Physical and logical diagrams
VLAN structure and segmentation
Link speeds, utilization, and oversubscription
East-west vs north-south traffic balance
Inter-site bandwidth and latency
Firewall and routing constraints

Purpose:
Ensure the network can sustain storage traffic, vMotion/Live Migration, replication, and backup flows.

Business Data Flow and Dependency Mapping

A workload rarely stands alone. Identifying dependencies is crucial.

Examples:

Web tier depends on application tier
Application tier depends on database
Database depends on storage latency
Authentication depends on AD
Reporting depends on shared services

Purpose:
Avoid isolating or moving a workload without understanding interdependencies that affect latency or availability.

Identifying Issues and Optimization Opportunities

During discovery, you should highlight:

Overloaded hosts or datastores
Underutilized hardware
Misconfigured networks
Orphaned VMs or unused LUNs
Legacy systems needing replacement
Licensing constraints

Purpose:
The assessment is not only about documenting the environment; it is also about identifying improvement areas before design begins.

3. Workload Placement Strategy (On-Prem / Cloud / Hybrid)

Modern environments require a strategy for placing each workload in the most suitable location.

On-Premises Workloads

Typically kept local when:

There are strict data residency or sovereignty requirements
Latency-sensitive applications require direct hardware access
Workloads are highly integrated with other on-prem systems
The business prefers CAPEX models
Regulatory or industry constraints restrict external hosting

Cloud-Hosted Workloads

Candidates for cloud placement often include:

Bursty or elastic workloads
Test and development systems
Stateless services
Web front-ends with global user bases
Workloads with predictable OPEX models

Hybrid Deployment

A combination of on-prem and cloud workloads is common when:

Some systems require low latency local resources
Others benefit from cloud elasticity and scale
DR or backup can benefit from cloud storage
Applications can span environments using APIs or service layers

Evaluation Factors

Workload placement must consider:

Data sovereignty and legal compliance
Latency tolerance
Inter-system dependencies
Cost model preference (CAPEX vs OPEX)
Application architecture (monolithic vs distributed)
Required network egress and bandwidth patterns
Security posture and risk profile

Purpose:
Ensure that each workload executes where it performs best while meeting compliance and cost objectives.

4. Integration of HPE Official Methods and Tools (Name-Level Only)

HPE provides tools and methodologies that support accurate planning and sizing. These are usually applied during the assessment stage to validate assumptions and build a defensible architecture.

HPE Sizer / HPE Workload Sizing Tools

Used to:

Predict CPU, memory, and storage needs
Validate performance requirements
Model hardware configurations
Compare different architecture scenarios

HPE InfoSight

Primarily used for:

Collecting performance telemetry from HPE systems
Detecting anomalies and predicting failures
Providing predictive analytics for sizing and optimization
Offering recommendations based on global fleet intelligence

HPE Reference Architectures / Best Practice Guides

These documents provide:

Validated configurations for common workloads
Prescriptive design recommendations
Supported limits and constraints
Best practices for SAN, networking, compute, and storage

HPE OneView Data and Configuration Exports

Used to:

Capture existing infrastructure configuration
Export server profiles, firmware baselines, and network settings
Validate consistency across nodes
Assist migration or redesign planning

Purpose:
Standardize the assessment process and ensure design decisions are aligned with HPE-validated practices.

Shopping cart

Subtotal:

HPE1-H05 Assessment and Planning

Detailed list of HPE1-H05 knowledge points

Assessment and Planning Detailed Explanation

1. Business and Technical Requirements

1.1 Business requirements

Business objectives

Critical business processes

Regulatory & compliance needs

Growth plans

Budget and procurement constraints

1.2 Technical requirements

Application types

Performance targets

Availability

Security

2. Workload and Capacity Assessment

2.1 Workload profiling

Baseline measurements

Peak vs average

Read/write ratio

Random vs sequential I/O

Working set sizes

2.2 Capacity planning

Compute capacity

Storage capacity

Network capacity

3. Risk, Constraints, and Dependencies

3.1 Constraints

3.2 Risk analysis

4. TCO and ROI Considerations

4.1 Total Cost of Ownership (TCO)

4.2 Return on investment (ROI)

5. Documentation and Stakeholder Communication

5.1 Documentation

5.2 Communication

Assessment and Planning (Additional Content)

1. Clear SLA and Recovery Objectives (RPO / RTO / Backup Window)

Recovery Point Objective (RPO)

Recovery Time Objective (RTO)

Backup Window

Why these metrics matter

2. Current Environment Discovery and Assessment

Server and VM Inventory

Storage System Utilization and Performance Data

Network Topology and Link Analysis

Business Data Flow and Dependency Mapping

Identifying Issues and Optimization Opportunities

3. Workload Placement Strategy (On-Prem / Cloud / Hybrid)

On-Premises Workloads

Cloud-Hosted Workloads

Hybrid Deployment

Evaluation Factors

4. Integration of HPE Official Methods and Tools (Name-Level Only)

HPE Sizer / HPE Workload Sizing Tools

HPE InfoSight

HPE Reference Architectures / Best Practice Guides

HPE OneView Data and Configuration Exports

Frequently Asked Questions