The Compute Layer is the “brain” of the AI system. It processes data and trains the model by performing massive calculations.
Purpose: General-purpose computing tasks.
Best for: Data preprocessing, small-scale inference (e.g., running a model on a personal computer).
Pros: Flexible, available in nearly all machines.
Cons: Slow for training deep learning models due to limited parallelism.
Example: A CPU might be used to clean and organize training data before it’s sent to a GPU.
Purpose: Designed for parallel processing of large data sets.
Best for: Training deep learning models, handling complex mathematical operations.
Pros: Thousands of cores; much faster than CPUs for AI training.
Cons: Expensive; requires careful memory management.
Example: Most large AI models (like image recognition or natural language processing) are trained on GPUs.
Purpose: Custom-designed chip by Google for AI workloads.
Best for: Training models built with TensorFlow.
Pros: Extremely fast for matrix-heavy operations, like neural network layers.
Cons: Only available via Google Cloud.
Example: Google’s AI services like Translate or Search Rankers are trained on TPUs.
Purpose: Hardware chips tailored for specific tasks.
Best for: Low-power, specialized AI inference at the edge (e.g., in IoT devices or wearables).
Pros: High performance, low power consumption.
Cons: Not flexible — harder to update or retrain.
Example: A smart security camera using facial recognition might use an ASIC to run the model locally without internet.
AI models require access to large amounts of data, and they need to read/write that data quickly. That’s where the Storage Layer comes in.
What it is: Traditional way of saving files in folders/directories.
Common tool: NFS (Network File System)
Best for: Structured data like CSVs or small image sets
Pros: Easy to set up and access
Cons: Slower and harder to scale for very large datasets
Example: A research lab storing 100,000 images for training might use file storage in the early development stage.
What it is: Stores data as “objects” — each with its own metadata and unique ID.
Common tools: Amazon S3, NetApp ONTAP S3
Best for: Unstructured, large-scale AI data (videos, logs, sensor data)
Pros: Highly scalable and cost-efficient
Cons: Slightly higher access latency than file systems
Example: A video surveillance system stores hundreds of hours of footage for model training — object storage handles this more efficiently.
What it is: Distributes files across multiple servers for fast, parallel access.
Common tools: Lustre, BeeGFS
Best for: Large AI training jobs that need high data throughput
Pros: High performance, supports thousands of files accessed simultaneously
Cons: Complex to set up and manage
Example: Training a massive language model (like GPT) may require reading petabytes of data quickly — parallel file systems are essential.
The Network Layer is how all the hardware components — like CPUs, GPUs, and storage systems — talk to each other.
AI workloads often require massive data movement, especially during model training on multiple GPUs or nodes.
Use: High-performance computing (HPC) and AI clusters
Benefits: Low latency, high bandwidth
Why it matters: Prevents bottlenecks during large-scale training
Use: Allows fast memory-to-memory transfers without using the CPU
Benefits: Faster GPU-to-GPU communication, reduced system load
Why it matters: Critical for GPU clusters and model parallelism
Example: In a GPU cluster training an AI model, InfiniBand ensures data is transferred between nodes in milliseconds rather than seconds.
Even powerful hardware can be wasted without proper usage. These techniques ensure efficient use of compute resources:
What it is: Grouping multiple input samples together before sending them to the model
Why it helps: Makes better use of GPU memory and reduces idle time
Example: Instead of processing one image at a time, process 64 images together
What it is: Running AI training jobs during low-demand times (e.g., nights or weekends)
Why it helps: Reduces costs and avoids competing with daytime tasks
What it is: Setting boundaries on how much CPU/GPU a task can use
Why it helps: Prevents one task from hogging all resources in a shared environment
Example: In a shared GPU cluster, each user may be limited to two GPUs at a time
Modern enterprise-grade AI infrastructure combines high-performance compute, ultra-fast storage, and low-latency networking into reference architectures. Two commonly cited examples in the NS0-901 context are:
Components:
NVIDIA DGX A100 servers (GPU-accelerated training nodes)
NetApp AFF storage arrays (All-Flash Fabric for high-speed I/O)
InfiniBand network fabric (for low-latency, high-throughput interconnect)
Use Case:
Benefits:
Unified AI training fabric
Streamlined data access and replication
Scalable, modular architecture with end-to-end integration
Components:
Cisco UCS Servers with NVIDIA GPUs
NetApp AFF or hybrid storage
NVIDIA GPU Operator for resource scheduling
Optional Kubernetes for container orchestration
Use Case:
Benefits:
Validated architecture with simplified deployment
Predictable performance and SLAs
Integration with MLOps pipelines (e.g., MLFlow, Airflow)
These integrated architectures illustrate how GPU compute, NVMe-based flash storage, and low-latency networking (InfiniBand or 100G Ethernet) are brought together to form production-grade AI clusters.
AI systems require massive amounts of diverse data, and the way this data is stored, queried, and managed plays a central role in performance and scalability.
Purpose: Centralized storage of structured data for analytics and reporting.
Strengths: Schema-enforced, optimized for SQL queries.
Weaknesses: Not ideal for unstructured data or AI workloads.
Purpose: Stores raw, unstructured, and semi-structured data at scale.
Strengths:
Stores everything (logs, images, documents)
Flexible schema-on-read
Weaknesses:
Slower query performance
Harder data governance
Hybrid model combining the flexibility of lakes with the performance of warehouses.
Platforms: Delta Lake, Apache Iceberg, Databricks Lakehouse.
Use in AI:
One-stop location for training, feature engineering, and serving AI models.
Enables streaming + batch + ML access from the same system.
These structures form the data substrate layer that AI pipelines interact with—particularly in data preparation and online feature lookup.
Understanding core performance metrics is essential for evaluating AI hardware systems, especially storage subsystems.
Definition: Number of read/write operations a storage system can perform per second.
Relevance:
Definition: Total volume of data transferred per second, typically measured in MBps or GBps.
Relevance:
Definition: Delay between data request and delivery (typically measured in milliseconds or microseconds).
Relevance:
Example Use Case:
A training pipeline retrieving 4K images from object storage might prioritize throughput,
Whereas a microservice performing image classification would prioritize latency.
In modern AI systems, resource efficiency depends heavily on intelligent scheduling mechanisms, especially for GPUs and high-throughput storage.
Kubernetes: Orchestrates containerized AI workloads.
NVIDIA GPU Operator:
Automates driver installation, GPU discovery, and monitoring.
Exposes GPU as a schedulable resource to Kubernetes.
Ensures GPU resource isolation across training jobs.
Node affinity rules: Ensure GPU-bound tasks are placed on GPU-equipped nodes.
Resource quotas: Control how much GPU/CPU/memory each pod or user consumes.
Priority classes: Schedule high-priority training over background tasks.
A team runs a BERT model training job in a Kubernetes cluster. The GPU Operator ensures:
The job is routed to a DGX server with available A100 GPUs.
GPU metrics (temperature, memory) are monitored.
The training container mounts high-throughput NetApp storage via Trident.
This orchestration ensures hardware utilization is optimized while maintaining job reproducibility and fairness.
Why are GPUs commonly used for AI model training instead of CPUs?
GPUs are preferred for AI training because they can process thousands of parallel mathematical operations simultaneously, which significantly accelerates neural network computations.
Deep learning models rely heavily on matrix multiplications and vector calculations. CPUs are designed for sequential processing and typically have fewer cores optimized for general-purpose tasks. GPUs contain thousands of smaller cores that allow large numbers of operations to be executed concurrently. This architecture dramatically improves performance when training large neural networks that require billions of calculations. In AI infrastructure environments, GPU clusters enable faster training cycles and reduce the time required to iterate on model development.
Demand Score: 74
Exam Relevance Score: 86
What role does high-performance storage play in AI training environments?
High-performance storage ensures that large datasets can be delivered to compute resources quickly enough to keep GPUs fully utilized during training.
AI training requires continuous streaming of large volumes of data to compute nodes. If storage throughput or latency is insufficient, GPUs may remain idle while waiting for data. High-performance storage systems provide fast read/write speeds and scalable capacity to support data-intensive workloads. They often include parallel file systems or distributed storage architectures that allow multiple compute nodes to access training datasets simultaneously. In enterprise AI environments, storage performance directly affects model training efficiency and overall system utilization.
Demand Score: 72
Exam Relevance Score: 88
What is the purpose of high-speed networking in AI training clusters?
High-speed networking enables rapid data exchange between compute nodes and storage systems, allowing distributed AI training workloads to scale efficiently.
Large AI models are often trained across multiple GPUs or servers using distributed training frameworks. During training, nodes must exchange gradients, model parameters, and training data continuously. If network bandwidth is limited or latency is high, communication overhead can slow down the entire training process. High-speed networking technologies help ensure that distributed training remains efficient by enabling rapid synchronization between compute resources and data storage systems.
Demand Score: 70
Exam Relevance Score: 84
Why is scalability an important characteristic of AI hardware architecture?
Scalability allows AI infrastructure to expand compute, storage, and networking resources as model sizes and dataset volumes increase.
AI workloads often grow rapidly as organizations adopt larger models and collect more data. A scalable architecture allows additional GPUs, storage nodes, and networking capacity to be added without redesigning the entire infrastructure. This flexibility ensures that organizations can handle evolving workloads while maintaining system performance. Scalable systems also allow distributed training and parallel processing, which are essential for modern deep learning models that require massive computational resources.
Demand Score: 69
Exam Relevance Score: 80