Metrics are numeric measurements.
They are used to represent the performance, health, or behavior of a system or application over a period of time.
Metrics help answer questions like:
How busy is the CPU right now?
How much memory is being used?
How many errors occurred in the last 5 minutes?
Metrics are simple but extremely powerful. They allow us to monitor the state of systems and detect problems early.
Each metric typically includes four important components:
Name:
The identifier of what is being measured.
Example: cpu.utilization
The name should be clear and descriptive.
Value:
The actual numeric measurement at a given time.
Example: CPU utilization could be 45.7 percent.
Timestamp:
The exact moment when the measurement was recorded.
Timestamps allow metrics to be plotted over time.
Dimensions (also called Labels):
Extra key-value pairs that provide more context about the metric.
Example:
host: server1
region: us-west-2
Dimensions allow the same metric to be grouped and filtered based on different attributes.
Without dimensions, all CPU metrics would be mixed together. With dimensions, you can see CPU utilization separately for each server, each region, or each application.
Different types of metrics are designed for different measurement needs. Here are the most important types:
A Gauge is a metric that can go up or down.
It represents a value at a specific point in time.
Examples:
CPU usage percentage
Temperature reading
Memory usage
Gauges are good for things that fluctuate frequently.
A Counter is a metric that only increases.
It represents something that keeps adding up over time.
Examples:
Number of HTTP requests received
Number of errors encountered
Counters are reset only when the system restarts, otherwise they keep growing.
A Histogram measures the distribution of values.
It places measurements into buckets based on ranges.
Example:
Histograms are useful for understanding not just average values, but the spread of the data.
A Summary is similar to a histogram, but it focuses on percentiles.
Examples:
95th percentile response time
99th percentile latency
Summaries help answer questions like:
"What response time are 95 percent of users experiencing?"
A time series is a sequence of metric measurements taken over time, for a specific set of dimensions.
Important points:
The combination of a metric name and its dimensions defines a unique time series.
If any dimension changes, it creates a new time series.
Example:
Metric Name: cpu.utilization
Dimensions:
host: server1
region: us-west-2
This is one time series.
If you change the host dimension to server2, it becomes a different time series.
In real systems, there can be millions of time series being collected and stored.
Dimensions are extremely important because they allow for fine-grained analysis.
With dimensions, you can:
Filter metrics to view only those from a specific server, region, or environment.
Aggregate metrics to see the average CPU usage across all servers in a data center.
Build customized dashboards showing specific groups of systems.
Good use of dimensions leads to:
Easier troubleshooting
More powerful visualizations
Better scalability of monitoring systems
Poorly managed dimensions can make analysis confusing and messy.
Cardinality means the number of unique time series you are managing.
High cardinality occurs when you have:
Many different metric names
Many different combinations of dimensions
For example, if you have:
1000 servers
10 services per server
5 dimensions per metric
You could quickly end up with millions of unique time series.
Problems caused by high cardinality:
Increased storage costs
Slower query performance
Difficulty managing and visualizing metrics
Best practice:
Only add dimensions that are truly necessary.
Avoid using high-cardinality fields (like user IDs or session IDs) as dimensions unless absolutely needed.
Careful design of metrics and dimensions keeps your system efficient and manageable.
Metrics go through a full lifecycle, from their creation to their final use. The stages are:
Applications, servers, or infrastructure generate (emit) metrics.
Example: A web server records how many requests it handles per second.
The OpenTelemetry Collector gathers these emitted metrics.
It uses configured receivers to pull metrics from systems.
Metrics are transmitted from the Collector to a backend system like Splunk Observability Cloud.
This is typically done over secure network connections.
The backend stores metrics in a time series database.
Users can query the database to retrieve specific metrics over time.
Metrics are visualized through:
Charts
Dashboards
Navigators
Alerting is set up using detectors to notify teams when metrics cross important thresholds.
Each stage is critical for ensuring metrics are useful, accurate, and actionable.
You have now learned:
What metrics are and why they matter.
The structure of a metric (name, value, timestamp, dimensions).
The four main types of metrics: gauge, counter, histogram, summary.
The concept of time series and how changing dimensions creates new series.
Why dimensions are powerful tools for filtering and aggregation.
What metric cardinality is and why managing it is important.
The complete lifecycle of a metric from creation to visualization.
In Splunk Observability Cloud, Dimensions are critical attributes attached to metrics that provide additional context for data analysis.
They allow users to group, filter, and aggregate metrics based on specific characteristics, such as host name, region, service name, or environment type.
An important point to remember:
Dimensions are also referred to as "tags" within the Splunk Observability Cloud interface.
This synonym is frequently used in different parts of the system, such as:
Dashboard filter configurations
SignalFlow programs
Detector conditions
You may encounter questions phrased like:
"Which tag should you filter by to view only production systems?"
Even though the term "tag" is used, the question is actually referring to dimension-based filtering.
Therefore, always remember that "tags" and "dimensions" are interchangeable terms in the context of Splunk Observability.
Cardinality refers to the number of unique time series produced by combining metric names with dimension values.
High cardinality can severely impact system performance by increasing:
Storage costs
Query complexity
Dashboard and alert evaluation times
User ID
Each unique user generates a different value, potentially leading to millions of distinct time series.
Session ID
A new session ID is generated every time a user opens a new session, making it extremely dynamic and numerous.
Transaction ID
Each business transaction (such as an e-commerce order or banking transfer) typically has a unique ID, dramatically increasing the number of unique dimension combinations.
Dynamic Container IDs (Kubernetes Pods and Containers)
In dynamic environments like Kubernetes, each pod or container often has a unique identifier that changes frequently with scaling and redeployment, leading to explosion in cardinality.
You may encounter questions such as:
"Which dimension is most likely to create high cardinality problems in a metrics system?"
The correct answer would likely be something like:
"Session ID" or "Transaction ID".
In contrast, dimensions like region, environment, or availability_zone usually have low cardinality because their possible values are relatively fixed and limited.
| Topic | Key Points |
|---|---|
| Dimensions and Tags | In Splunk, dimensions and tags are the same concept and used interchangeably. |
| High Cardinality Sources | Watch out for dynamic or user-specific dimensions like User ID, Session ID, Transaction ID, and dynamic container IDs. |
What defines a unique Metric Time Series (MTS) in Splunk Observability Cloud?
A unique MTS is defined by a metric name combined with a unique set of dimension key-value pairs.
Each datapoint includes a metric name, timestamp, value, and dimensions. When the same metric name is paired with a unique combination of dimensions such as host, region, or container ID, it forms a distinct Metric Time Series. If any dimension value differs, a new MTS is created. This structure allows observability platforms to track metrics across multiple infrastructure entities while maintaining separation between sources. Understanding MTS identity is important for query results, aggregation behavior, and analytics operations in charts and detectors.
Demand Score: 76
Exam Relevance Score: 90
What components make up a datapoint in the Splunk Infrastructure Monitoring data model?
A datapoint consists of a metric name, timestamp, value, and associated dimensions.
The metric name identifies the measurement being collected, such as CPU utilization. The timestamp records when the measurement occurred. The value represents the numeric observation at that moment. Dimensions provide contextual metadata, such as host, service, or container identifiers. These dimensions allow observability systems to distinguish between different sources of the same metric. Together, these components enable accurate aggregation, filtering, and analysis of infrastructure telemetry data.
Demand Score: 72
Exam Relevance Score: 88
How does data resolution affect metrics analysis in observability platforms?
Data resolution determines the granularity of datapoints stored and displayed, which influences chart accuracy and aggregation behavior.
Higher resolution means datapoints are collected more frequently, providing detailed insight into system behavior. Lower resolution aggregates datapoints into larger time intervals, which can smooth spikes and reduce storage requirements. Observability platforms often roll up older high-resolution data into lower-resolution summaries over time. When viewing charts, the resolution affects how analytic functions operate and how accurately short-lived events appear. Understanding resolution helps prevent misinterpretation of metrics trends.
Demand Score: 69
Exam Relevance Score: 87