OpenTelemetry (often abbreviated as OTel) is a free, open-source, and vendor-neutral standard.
Its main purpose is to collect telemetry data, which includes:
Metrics: Numeric measurements like CPU usage or memory consumption.
Logs: Text-based records of system events, such as errors or status updates.
Traces: Records showing the path and performance of a request through various parts of a distributed system.
OpenTelemetry acts as a common language.
No matter which technology stack or system you are using, OpenTelemetry can collect monitoring data in a unified format, making it much easier to observe and understand system behavior.
Splunk Observability Cloud needs a reliable and standardized method to collect data about systems, applications, and infrastructure.
Instead of building different data collectors for every type of system, Splunk uses OpenTelemetry because:
OpenTelemetry can collect data from many different sources.
It follows modern, efficient standards.
It is widely supported by the cloud and monitoring communities.
Using OpenTelemetry, Splunk Observability Cloud can easily and accurately ingest, analyze, and visualize telemetry data.
Without OpenTelemetry, integrating diverse systems with Splunk would be much slower, harder, and less efficient.
When using OpenTelemetry, especially for metrics collection, there are several important parts you need to understand:
The OpenTelemetry Collector is a service or program that you install on machines such as servers, cloud virtual machines, or containers.
Its job is to:
Collect telemetry data.
Optionally modify or process that data.
Send the processed data to one or more backends like Splunk Observability Cloud.
You can think of the OpenTelemetry Collector like a mail center.
It collects "letters" (metrics, logs, traces), sorts them, and delivers them to their final destination.
The Collector is extremely flexible and can be deployed in different ways, depending on the needs of your system. For example:
As an agent running directly on a host.
As a sidecar inside a Kubernetes pod.
As a standalone gateway server.
Receivers are the parts of the Collector that pull data in from sources.
They are configured to collect data from specific places such as:
Host system metrics (CPU, memory, disk).
Kubernetes cluster data.
Custom application metrics exposed via APIs.
Examples of Receivers include:
hostmetrics receiver: Collects system-level metrics like CPU usage or memory utilization.
kubeletstats receiver: Collects metrics from Kubernetes nodes.
Receivers are like input ports that specify where to find the data.
After telemetry data is received, Processors can optionally modify, enrich, or optimize the data before it is sent out.
Examples of what Processors can do:
Batching: Group many small telemetry data points into larger packets, making network transmission more efficient.
Resource Detection: Add important labels or metadata (such as "region: us-east-1" or "environment: production") to every metric.
Filtering: Remove unnecessary data to reduce noise and save resources.
Processors help make telemetry data more useful, organized, and efficient to handle.
Exporters are the parts of the Collector that send the processed data to a destination.
Destinations could be:
Splunk Observability Cloud
Another backend like Prometheus, Jaeger, or a custom system
In our case, the most important exporter is the Splunk HEC Exporter, which sends data over HTTP to Splunk using a secure token.
Exporters are like output ports.
They determine where the telemetry data ultimately goes after being collected and processed.
The typical data flow inside a Collector is:
Receivers → Processors → Exporters
This order ensures that data is properly handled at each stage.
Let’s break down the collection process into clear, step-by-step stages, keeping it very simple and detailed for you.
Before any data can be collected, you must deploy the OpenTelemetry Collector.
There are several deployment methods depending on the environment you are monitoring:
On Virtual Machines (VMs):
Install the Collector as a service directly on the operating system.
Example: Install on a Linux or Windows server.
In Containers:
Deploy the Collector inside Docker containers.
Suitable for environments using containerized applications.
In Kubernetes Clusters:
Deploy the Collector as a DaemonSet so that every Kubernetes node runs a Collector instance.
This method helps collect system metrics and Kubernetes-specific data automatically.
When installed, the Collector starts gathering system metrics automatically if configured properly.
Important: Always install the Collector close to where the data is generated to ensure efficient collection and reduce latency.
Installation alone is not enough.
You must configure the Collector so it knows:
What data to collect
How to process it
Where to send it
Configuration is usually done using a YAML file.
YAML is a simple, structured text format used to define settings.
In the YAML file, you define three main sections:
This part tells the Collector which sources to listen to.
Example configuration for collecting system metrics:
receivers:
hostmetrics:
collection_interval: 60s
scrapers:
cpu:
memory:
disk:
filesystem:
Explanation:
hostmetrics receiver is used.
Metrics are collected every 60 seconds.
Specific metrics like CPU, memory, disk, and filesystem are scraped.
This part specifies where the collected and processed data will be sent.
Example configuration for sending metrics to Splunk:
exporters:
splunk_hec:
token: "<SPLUNK_TOKEN>"
endpoint: "<SPLUNK_HEC_ENDPOINT>"
Explanation:
splunk_hec is the exporter used to send data to Splunk.
A token is needed for authentication (more on this later).
The endpoint is the URL where Splunk expects to receive the data.
This part connects the receivers, processors (if any), and exporters together into a pipeline.
Example pipeline configuration:
service:
pipelines:
metrics:
receivers: [hostmetrics]
exporters: [splunk_hec]
Explanation:
The pipeline is named metrics.
It receives data from the hostmetrics receiver.
It exports data using the splunk_hec exporter.
In simple terms:
Metrics flow from receivers → through any processors → to exporters → finally to Splunk.
When sending data to Splunk, you must prove your identity so Splunk knows the data is trusted.
Splunk Observability uses a HEC Token (HTTP Event Collector Token) for authentication.
The token is a special long string that acts like a password.
It must be securely stored and properly configured in the Collector’s YAML file.
Without the correct token, Splunk will reject the data for security reasons.
You can usually generate or find your HEC token in the Splunk Observability Cloud user interface under Organization Settings.
Sometimes, default system metrics like CPU or memory are not enough.
You might want to collect custom metrics directly from your applications, such as:
Number of active users
Payment success rates
Custom business KPIs
To do this:
Instrument your application code using OpenTelemetry SDKs.
Supported languages include Java, Python, Go, JavaScript, and others.
Your application code will push metrics to a local OpenTelemetry Collector, which will then send them to Splunk.
For example, in Python, you could write code like this:
from opentelemetry import metrics
meter = metrics.get_meter_provider().get_meter("example")
counter = meter.create_counter("payment_successes")
counter.add(1, {"region": "us-west"})
This sends a metric named payment_successes to your Collector.
Before metrics are sent out, it is often useful to enrich them or route them based on certain rules.
Enrichment: Adding additional labels (dimensions) to metrics automatically.
environment:production to every metric.Routing: Sending different metrics to different places.
This can be configured using Processors inside the Collector’s YAML file.
Example enrichment:
processors:
resource:
attributes:
- key: environment
value: production
action: insert
With this configuration, every metric will have an environment=production label automatically added.
When using OpenTelemetry Collectors in a real-world environment, it is critical to follow best practices to ensure your monitoring system remains:
Stable
Scalable
Secure
Efficient
Let’s carefully go through each best practice one by one.
Always deploy the OpenTelemetry Collector as close to the source of data as possible.
If you are collecting server metrics, install the Collector directly on the server.
If you are collecting Kubernetes metrics, deploy the Collector inside the Kubernetes cluster as a DaemonSet.
Benefits of deploying close to the source:
Lower network latency
Reduced risk of data loss
Better performance
Easier to capture node-specific system events
When Collectors are deployed remotely or too far away, metrics can arrive late, incomplete, or even be lost during transmission.
When sending large volumes of metrics over the network:
Batching combines many small metric events into fewer, larger network transmissions.
Compression reduces the size of the transmitted data.
Benefits:
Saves network bandwidth
Reduces load on the Splunk backend
Improves overall ingestion speed
In OpenTelemetry configuration, you can enable a batch processor to automatically batch metrics before exporting.
Example:
processors:
batch:
send_batch_size: 1024
timeout: 5s
This means:
Send metrics in batches of 1024.
If batch size is not reached within 5 seconds, send whatever is collected so far.
Batching is almost always recommended unless dealing with extremely low traffic environments.
Instead of creating a single, massive configuration file, it is better to:
Split configuration files into smaller pieces based on function.
Use modular templates for different environments such as production, staging, and development.
For example:
One YAML file for receivers
One YAML file for processors
One YAML file for exporters
Main YAML file to include all others
This modularity helps:
Easier management
Faster troubleshooting
Safer deployments (less risk of errors)
Also, use environment-specific variables for tokens, endpoints, and settings, so you do not accidentally send staging data into a production environment or vice versa.
The Collectors themselves are critical pieces of infrastructure.
You must treat them as production-grade services.
Best practices include:
Use health check endpoints: OpenTelemetry Collectors expose a health check URL (by default on port 13133).
Monitor CPU, memory, and network usage of the Collectors.
Set up detectors in Splunk Observability Cloud to alert if a Collector goes down or becomes unhealthy.
Use logs and metrics from the Collector itself to detect configuration errors, performance bottlenecks, or resource exhaustion.
A Collector that silently fails can cause major monitoring blind spots.
Active monitoring of your Collectors is essential for system reliability.
Here is a brief recap of everything you have learned:
OpenTelemetry is an open-source, vendor-neutral standard for collecting metrics, logs, and traces.
Splunk Observability Cloud uses OpenTelemetry as its preferred method for metric ingestion.
The OpenTelemetry Collector is responsible for:
Receiving telemetry data (via Receivers)
Processing telemetry data (via Processors)
Exporting telemetry data (via Exporters)
Installing and configuring the Collector involves:
Defining Receivers, Processors, Exporters
Writing a proper YAML configuration file
Ensuring authentication using a Splunk HEC token
Custom metrics can be pushed from applications instrumented with OpenTelemetry SDKs.
Best practices include:
Deploying Collectors close to the source
Batching and compressing metrics
Using modular and environment-specific configurations
Monitoring the health of the Collectors themselves
In real-world production environments, setting up OpenTelemetry for Splunk Observability Cloud does not typically involve manually building a raw OpenTelemetry Collector from scratch.
Instead, Splunk provides an official distribution called the Splunk Distribution of the OpenTelemetry Collector (often abbreviated as Splunk OTel Collector).
This Splunk-customized version of the Collector includes several key advantages:
Pre-integrated Components:
The distribution comes prepackaged with all necessary components that Splunk requires, such as the correct Receivers, Processors, and Exporters for full Observability Cloud integration.
Deployment Automation:
It includes automated deployment options using tools like Terraform scripts, Ansible playbooks, and cloud-native service templates, making deployment much faster and more consistent.
Performance Optimizations:
By default, the Splunk Distribution enables batch processing, compression, and other network and resource optimizations, ensuring high efficiency and minimal overhead when exporting data.
Simplified Configuration:
Compared to manually building and configuring a generic OpenTelemetry Collector, the Splunk OTel Collector requires much less manual work, with Splunk-validated best practices already in place.
You may encounter questions such as:
"When installing a Collector for Splunk Observability Cloud, it is recommended to use Splunk’s Distribution of OpenTelemetry Collector rather than building from raw OpenTelemetry."
The correct answer is True.
In real-world deployments, Splunk recommends using their preconfigured Splunk Distribution of the OpenTelemetry Collector, which simplifies installation and automatically optimizes settings for Splunk Observability Cloud.
In addition to manual instrumentation, where developers explicitly add telemetry code to their applications using OpenTelemetry SDKs, Splunk also supports a technique called Auto-Instrumentation.
Auto-Instrumentation allows telemetry data to be collected without modifying the application’s source code.
Key features of Auto-Instrumentation:
Minimal Application Changes:
Auto-instrumentation requires very little to no change to the application codebase.
Developers simply attach an instrumentation agent or wrapper to the application runtime.
Examples:
Java Applications:
By adding a -javaagent startup parameter, Java applications can automatically capture metrics, traces, and logs related to HTTP requests, database queries, and other common operations.
Python Flask Applications:
By applying a lightweight wrapper around the Flask app initialization, telemetry can be captured automatically, including incoming HTTP requests and database call timings.
Rapid Telemetry Onboarding:
With auto-instrumentation, teams can begin monitoring application behavior almost immediately, speeding up time-to-value without waiting for major development work.
You may see questions like:
"What is one advantage of using auto-instrumentation via OpenTelemetry agents?"
The correct answer is something like:
"You can collect key telemetry data without modifying application source code."
Splunk’s Observability platform supports Auto-Instrumentation for many languages, enabling telemetry collection without manual code changes, by attaching OpenTelemetry agents or lightweight wrappers at runtime.
| Topic | Key Points |
|---|---|
| Splunk OTel Collector | Prepackaged, optimized, simplifies deployment, best for production |
| Auto-Instrumentation | No source code changes needed, fast telemetry capture via agents or wrappers |
Why might the OpenTelemetry Collector be running successfully but no metrics appear in Splunk Observability Cloud?
The most common cause is incorrect exporter configuration, particularly missing or incorrect access tokens and endpoint settings for the Splunk Observability Cloud ingest API.
Even if the Collector service starts successfully, telemetry data will not be transmitted unless the exporter block correctly specifies the Splunk ingest endpoint and authentication token. Another frequent issue is that receivers are defined but not connected to pipelines. In OpenTelemetry architecture, pipelines explicitly connect receivers → processors → exporters. If the metrics pipeline does not include the Splunk exporter, the data is never transmitted. Network restrictions, invalid realm configuration, or disabled metrics receivers can also block ingestion. Ensuring the pipeline structure and exporter credentials are correct resolves most ingestion issues.
Demand Score: 82
Exam Relevance Score: 90
In an OpenTelemetry Collector configuration, what role does the pipeline play in telemetry processing?
A pipeline defines how telemetry data flows from receivers through processors to exporters.
The OpenTelemetry Collector architecture separates telemetry components into receivers, processors, and exporters. Receivers ingest telemetry data from sources such as Prometheus or system metrics. Processors modify or batch data, while exporters send the data to backend platforms like Splunk Observability Cloud. The pipeline block explicitly connects these components and determines the flow of telemetry data. Without a correctly defined pipeline, components remain isolated and data processing never occurs. Pipelines are typically defined separately for metrics, traces, and logs. Misconfigured pipelines are a common reason telemetry fails to reach the destination.
Demand Score: 74
Exam Relevance Score: 88
What is a common cause of configuration validation errors when starting the OpenTelemetry Collector?
Configuration validation errors commonly occur when required fields for receivers, exporters, or pipelines are missing or incorrectly referenced.
The Collector validates its configuration during startup. If a component referenced in a pipeline does not exist or is misnamed, the Collector fails validation. Another frequent cause is using unsupported configuration keys for a component version. For example, referencing a receiver that is not defined in the receivers section will cause a startup failure. YAML formatting mistakes such as incorrect indentation can also produce validation errors. Properly aligning component names and ensuring each pipeline references valid receivers, processors, and exporters prevents most configuration failures.
Demand Score: 71
Exam Relevance Score: 86
What are the primary components of the OpenTelemetry Collector architecture?
The core components are receivers, processors, exporters, and pipelines.
Receivers collect telemetry data from external sources such as Prometheus endpoints or application instrumentation. Processors transform or optimize the data, for example batching metrics or adding metadata. Exporters transmit processed telemetry to backend observability platforms such as Splunk Observability Cloud. Pipelines orchestrate how these components work together by defining the sequence in which telemetry moves through the Collector. Understanding these architectural components is essential for configuring telemetry ingestion and troubleshooting data flow problems.
Demand Score: 70
Exam Relevance Score: 85