Get Metrics In with OpenTelemetry

Get Metrics In with OpenTelemetry Detailed Explanation

1. What is OpenTelemetry?

OpenTelemetry (often abbreviated as OTel) is a free, open-source, and vendor-neutral standard.
Its main purpose is to collect telemetry data, which includes:

Metrics: Numeric measurements like CPU usage or memory consumption.
Logs: Text-based records of system events, such as errors or status updates.
Traces: Records showing the path and performance of a request through various parts of a distributed system.

OpenTelemetry acts as a common language.
No matter which technology stack or system you are using, OpenTelemetry can collect monitoring data in a unified format, making it much easier to observe and understand system behavior.

2. Why is OpenTelemetry Important for Splunk?

Splunk Observability Cloud needs a reliable and standardized method to collect data about systems, applications, and infrastructure.

Instead of building different data collectors for every type of system, Splunk uses OpenTelemetry because:

OpenTelemetry can collect data from many different sources.
It follows modern, efficient standards.
It is widely supported by the cloud and monitoring communities.

Using OpenTelemetry, Splunk Observability Cloud can easily and accurately ingest, analyze, and visualize telemetry data.

Without OpenTelemetry, integrating diverse systems with Splunk would be much slower, harder, and less efficient.

3. Key Components of OpenTelemetry

When using OpenTelemetry, especially for metrics collection, there are several important parts you need to understand:

OpenTelemetry Collector

The OpenTelemetry Collector is a service or program that you install on machines such as servers, cloud virtual machines, or containers.
Its job is to:

Collect telemetry data.
Optionally modify or process that data.
Send the processed data to one or more backends like Splunk Observability Cloud.

You can think of the OpenTelemetry Collector like a mail center.
It collects "letters" (metrics, logs, traces), sorts them, and delivers them to their final destination.

The Collector is extremely flexible and can be deployed in different ways, depending on the needs of your system. For example:

As an agent running directly on a host.
As a sidecar inside a Kubernetes pod.
As a standalone gateway server.

Receivers

Receivers are the parts of the Collector that pull data in from sources.
They are configured to collect data from specific places such as:

Host system metrics (CPU, memory, disk).
Kubernetes cluster data.
Custom application metrics exposed via APIs.

Examples of Receivers include:

hostmetrics receiver: Collects system-level metrics like CPU usage or memory utilization.
kubeletstats receiver: Collects metrics from Kubernetes nodes.

Receivers are like input ports that specify where to find the data.

Processors

After telemetry data is received, Processors can optionally modify, enrich, or optimize the data before it is sent out.

Examples of what Processors can do:

Batching: Group many small telemetry data points into larger packets, making network transmission more efficient.
Resource Detection: Add important labels or metadata (such as "region: us-east-1" or "environment: production") to every metric.
Filtering: Remove unnecessary data to reduce noise and save resources.

Processors help make telemetry data more useful, organized, and efficient to handle.

Exporters

Exporters are the parts of the Collector that send the processed data to a destination.

Destinations could be:

Splunk Observability Cloud
Another backend like Prometheus, Jaeger, or a custom system

In our case, the most important exporter is the Splunk HEC Exporter, which sends data over HTTP to Splunk using a secure token.

Exporters are like output ports.
They determine where the telemetry data ultimately goes after being collected and processed.

The typical data flow inside a Collector is:

Receivers → Processors → Exporters

This order ensures that data is properly handled at each stage.

4. How Metrics Are Collected

Let’s break down the collection process into clear, step-by-step stages, keeping it very simple and detailed for you.

Step 1: Install the OpenTelemetry Collector

Before any data can be collected, you must deploy the OpenTelemetry Collector.

There are several deployment methods depending on the environment you are monitoring:

On Virtual Machines (VMs):
- Install the Collector as a service directly on the operating system.
- Example: Install on a Linux or Windows server.
In Containers:
- Deploy the Collector inside Docker containers.
- Suitable for environments using containerized applications.
In Kubernetes Clusters:
- Deploy the Collector as a DaemonSet so that every Kubernetes node runs a Collector instance.
- This method helps collect system metrics and Kubernetes-specific data automatically.

When installed, the Collector starts gathering system metrics automatically if configured properly.

Important: Always install the Collector close to where the data is generated to ensure efficient collection and reduce latency.

Step 2: Configure the OpenTelemetry Collector

Installation alone is not enough.
You must configure the Collector so it knows:

What data to collect
How to process it
Where to send it

Configuration is usually done using a YAML file.
YAML is a simple, structured text format used to define settings.

In the YAML file, you define three main sections:

Receivers Section

This part tells the Collector which sources to listen to.

Example configuration for collecting system metrics:

receivers:
  hostmetrics:
    collection_interval: 60s
    scrapers:
      cpu:
      memory:
      disk:
      filesystem:

Explanation:

hostmetrics receiver is used.
Metrics are collected every 60 seconds.
Specific metrics like CPU, memory, disk, and filesystem are scraped.

Exporters Section

This part specifies where the collected and processed data will be sent.

Example configuration for sending metrics to Splunk:

exporters:
  splunk_hec:
    token: "<SPLUNK_TOKEN>"
    endpoint: "<SPLUNK_HEC_ENDPOINT>"

Explanation:

splunk_hec is the exporter used to send data to Splunk.
A token is needed for authentication (more on this later).
The endpoint is the URL where Splunk expects to receive the data.

Service Section

This part connects the receivers, processors (if any), and exporters together into a pipeline.

Example pipeline configuration:

service:
  pipelines:
    metrics:
      receivers: [hostmetrics]
      exporters: [splunk_hec]

Explanation:

The pipeline is named metrics.
It receives data from the hostmetrics receiver.
It exports data using the splunk_hec exporter.

In simple terms:
Metrics flow from receivers → through any processors → to exporters → finally to Splunk.

Step 3: Authentication

When sending data to Splunk, you must prove your identity so Splunk knows the data is trusted.

Splunk Observability uses a HEC Token (HTTP Event Collector Token) for authentication.

The token is a special long string that acts like a password.
It must be securely stored and properly configured in the Collector’s YAML file.

Without the correct token, Splunk will reject the data for security reasons.

You can usually generate or find your HEC token in the Splunk Observability Cloud user interface under Organization Settings.

Step 4: Collecting Custom Metrics from Applications

Sometimes, default system metrics like CPU or memory are not enough.
You might want to collect custom metrics directly from your applications, such as:

Number of active users
Payment success rates
Custom business KPIs

To do this:

Instrument your application code using OpenTelemetry SDKs.
Supported languages include Java, Python, Go, JavaScript, and others.
Your application code will push metrics to a local OpenTelemetry Collector, which will then send them to Splunk.

For example, in Python, you could write code like this:

from opentelemetry import metrics

meter = metrics.get_meter_provider().get_meter("example")
counter = meter.create_counter("payment_successes")

counter.add(1, {"region": "us-west"})

This sends a metric named payment_successes to your Collector.

Step 5: Metrics Routing and Enrichment

Before metrics are sent out, it is often useful to enrich them or route them based on certain rules.

Enrichment: Adding additional labels (dimensions) to metrics automatically.
- Example: Attach environment:production to every metric.
Routing: Sending different metrics to different places.
- Example: Metrics from production go to one Splunk org; metrics from testing go to another.

This can be configured using Processors inside the Collector’s YAML file.

Example enrichment:

processors:
  resource:
    attributes:
      - key: environment
        value: production
        action: insert

With this configuration, every metric will have an environment=production label automatically added.

5. Important Best Practices for Deploying OpenTelemetry Collectors

When using OpenTelemetry Collectors in a real-world environment, it is critical to follow best practices to ensure your monitoring system remains:

Stable
Scalable
Secure
Efficient

Let’s carefully go through each best practice one by one.

Deploy Collectors Close to the Data Source

Always deploy the OpenTelemetry Collector as close to the source of data as possible.

If you are collecting server metrics, install the Collector directly on the server.
If you are collecting Kubernetes metrics, deploy the Collector inside the Kubernetes cluster as a DaemonSet.

Benefits of deploying close to the source:

Lower network latency
Reduced risk of data loss
Better performance
Easier to capture node-specific system events

When Collectors are deployed remotely or too far away, metrics can arrive late, incomplete, or even be lost during transmission.

Always Batch and Compress Metrics

When sending large volumes of metrics over the network:

Batching combines many small metric events into fewer, larger network transmissions.
Compression reduces the size of the transmitted data.

Benefits:

Saves network bandwidth
Reduces load on the Splunk backend
Improves overall ingestion speed

In OpenTelemetry configuration, you can enable a batch processor to automatically batch metrics before exporting.

Example:

processors:
  batch:
    send_batch_size: 1024
    timeout: 5s

This means:

Send metrics in batches of 1024.
If batch size is not reached within 5 seconds, send whatever is collected so far.

Batching is almost always recommended unless dealing with extremely low traffic environments.

Keep Collector Configurations Modular and Environment-Specific

Instead of creating a single, massive configuration file, it is better to:

Split configuration files into smaller pieces based on function.
Use modular templates for different environments such as production, staging, and development.

For example:

One YAML file for receivers
One YAML file for processors
One YAML file for exporters
Main YAML file to include all others

This modularity helps:

Easier management
Faster troubleshooting
Safer deployments (less risk of errors)

Also, use environment-specific variables for tokens, endpoints, and settings, so you do not accidentally send staging data into a production environment or vice versa.

Monitor the Health and Performance of Collectors

The Collectors themselves are critical pieces of infrastructure.
You must treat them as production-grade services.

Best practices include:

Use health check endpoints: OpenTelemetry Collectors expose a health check URL (by default on port 13133).
Monitor CPU, memory, and network usage of the Collectors.
Set up detectors in Splunk Observability Cloud to alert if a Collector goes down or becomes unhealthy.
Use logs and metrics from the Collector itself to detect configuration errors, performance bottlenecks, or resource exhaustion.

A Collector that silently fails can cause major monitoring blind spots.
Active monitoring of your Collectors is essential for system reliability.

Final Summary: Full Understanding of "Get Metrics In with OpenTelemetry"

Here is a brief recap of everything you have learned:

OpenTelemetry is an open-source, vendor-neutral standard for collecting metrics, logs, and traces.
Splunk Observability Cloud uses OpenTelemetry as its preferred method for metric ingestion.
The OpenTelemetry Collector is responsible for:
- Receiving telemetry data (via Receivers)
- Processing telemetry data (via Processors)
- Exporting telemetry data (via Exporters)
Installing and configuring the Collector involves:
- Defining Receivers, Processors, Exporters
- Writing a proper YAML configuration file
- Ensuring authentication using a Splunk HEC token
Custom metrics can be pushed from applications instrumented with OpenTelemetry SDKs.
Best practices include:
- Deploying Collectors close to the source
- Batching and compressing metrics
- Using modular and environment-specific configurations
- Monitoring the health of the Collectors themselves

Get Metrics In with OpenTelemetry (Additional Content)

1. Splunk Distribution of OpenTelemetry Collector

In real-world production environments, setting up OpenTelemetry for Splunk Observability Cloud does not typically involve manually building a raw OpenTelemetry Collector from scratch.
Instead, Splunk provides an official distribution called the Splunk Distribution of the OpenTelemetry Collector (often abbreviated as Splunk OTel Collector).

This Splunk-customized version of the Collector includes several key advantages:

Pre-integrated Components:
The distribution comes prepackaged with all necessary components that Splunk requires, such as the correct Receivers, Processors, and Exporters for full Observability Cloud integration.
Deployment Automation:
It includes automated deployment options using tools like Terraform scripts, Ansible playbooks, and cloud-native service templates, making deployment much faster and more consistent.
Performance Optimizations:
By default, the Splunk Distribution enables batch processing, compression, and other network and resource optimizations, ensuring high efficiency and minimal overhead when exporting data.
Simplified Configuration:
Compared to manually building and configuring a generic OpenTelemetry Collector, the Splunk OTel Collector requires much less manual work, with Splunk-validated best practices already in place.

Important Exam Note:

You may encounter questions such as:

"When installing a Collector for Splunk Observability Cloud, it is recommended to use Splunk’s Distribution of OpenTelemetry Collector rather than building from raw OpenTelemetry."

The correct answer is True.

Suggested Addition to Your Study Notes:

In real-world deployments, Splunk recommends using their preconfigured Splunk Distribution of the OpenTelemetry Collector, which simplifies installation and automatically optimizes settings for Splunk Observability Cloud.

2. Auto-Instrumentation for Applications

In addition to manual instrumentation, where developers explicitly add telemetry code to their applications using OpenTelemetry SDKs, Splunk also supports a technique called Auto-Instrumentation.

Auto-Instrumentation allows telemetry data to be collected without modifying the application’s source code.

Key features of Auto-Instrumentation:

Minimal Application Changes:
Auto-instrumentation requires very little to no change to the application codebase.
Developers simply attach an instrumentation agent or wrapper to the application runtime.
Examples:
- Java Applications:
  By adding a -javaagent startup parameter, Java applications can automatically capture metrics, traces, and logs related to HTTP requests, database queries, and other common operations.
- Python Flask Applications:
  By applying a lightweight wrapper around the Flask app initialization, telemetry can be captured automatically, including incoming HTTP requests and database call timings.
Rapid Telemetry Onboarding:
With auto-instrumentation, teams can begin monitoring application behavior almost immediately, speeding up time-to-value without waiting for major development work.

Important Exam Note:

You may see questions like:

"What is one advantage of using auto-instrumentation via OpenTelemetry agents?"

The correct answer is something like:

"You can collect key telemetry data without modifying application source code."

Suggested Addition to Your Study Notes:

Splunk’s Observability platform supports Auto-Instrumentation for many languages, enabling telemetry collection without manual code changes, by attaching OpenTelemetry agents or lightweight wrappers at runtime.

Quick Summary of These Additions:

Topic	Key Points
Splunk OTel Collector	Prepackaged, optimized, simplifies deployment, best for production
Auto-Instrumentation	No source code changes needed, fast telemetry capture via agents or wrappers

Shopping cart

Subtotal:

SPLK-4001 Get Metrics In with OpenTelemetry

Detailed list of SPLK-4001 knowledge points

Get Metrics In with OpenTelemetry Detailed Explanation

1. What is OpenTelemetry?

2. Why is OpenTelemetry Important for Splunk?

3. Key Components of OpenTelemetry

OpenTelemetry Collector

Receivers

Processors

Exporters

4. How Metrics Are Collected

Step 1: Install the OpenTelemetry Collector

Step 2: Configure the OpenTelemetry Collector

Receivers Section

Exporters Section

Service Section

Step 3: Authentication

Step 4: Collecting Custom Metrics from Applications

Step 5: Metrics Routing and Enrichment

5. Important Best Practices for Deploying OpenTelemetry Collectors

Deploy Collectors Close to the Data Source

Always Batch and Compress Metrics

Keep Collector Configurations Modular and Environment-Specific

Monitor the Health and Performance of Collectors

Final Summary: Full Understanding of "Get Metrics In with OpenTelemetry"

Get Metrics In with OpenTelemetry (Additional Content)

1. Splunk Distribution of OpenTelemetry Collector

Important Exam Note:

Suggested Addition to Your Study Notes:

2. Auto-Instrumentation for Applications

Important Exam Note:

Suggested Addition to Your Study Notes:

Quick Summary of These Additions:

Frequently Asked Questions

Product Center

Exam Categories

Support & Community