Splunk’s architecture relies on several key components, each with a specific role in the data lifecycle.
The Search Head is the user interface component of Splunk. It’s where users interact with Splunk to:
Key Features of Search Heads:
The Indexer is the backbone of Splunk’s data storage and processing capabilities. Its main responsibilities include:
How it Works:
Forwarders are Splunk’s data collection agents that send data to Indexers. There are two main types:
Universal Forwarder (UF):
Heavy Forwarder (HF):
When to Use Each:
The Cluster Manager is critical for managing Splunk’s clustering features. It ensures high availability and data redundancy in distributed environments.
Responsibilities:
The Deployment Server simplifies managing Splunk instances, especially forwarders, by acting as a centralized configuration management tool.
Key Functions:
Splunk processes data through a pipeline consisting of several distinct stages. Each stage transforms and prepares the data for analysis.
As a Splunk administrator, your daily responsibilities will include installation, management, and command-line operations.
.tgz package or install via a package manager..dmg package for installation.http://<hostname>:8000).Restarting Splunk Services:
splunk start: Starts the Splunk service.splunk stop: Stops the Splunk service.splunk restart: Restarts Splunk, applying any new configurations.Monitoring System Health:
Troubleshooting Errors:
splunkd.log).The Splunk Command-Line Interface (CLI) is a powerful tool for managing and troubleshooting Splunk instances.
splunk start / splunk stop / splunk restart: Control the service lifecycle.splunk show config: Displays the current configuration.splunk show license-status: Shows license usage and status.splunk btool: Debug configuration files and identify issues.The Search Head is where all user interactions with Splunk occur, making it a critical component. Let’s break down its key functions further:
Search Query Execution:
Users write SPL (Search Processing Language) queries on the Search Head.
The Search Head distributes these queries to Indexers for execution and collects the results.
Example SPL query:
index=web_logs sourcetype=apache | stats count by status
Dashboards and Reports:
Search Head Clustering:
Common Issues and Solutions:
The Indexer plays a pivotal role in data ingestion and search performance. Here’s a deeper look at its functionality:
Indexing Process:
Indexer Clustering:
Monitoring Indexer Health:
Forwarders act as data collectors and are the primary method for sending data to Indexers.
Universal Forwarder (UF):
Heavy Forwarder (HF):
Configuration Examples:
Universal Forwarder:
# inputs.conf on UF
[monitor:///var/log/syslog]
disabled = false
index = main
sourcetype = syslog
Heavy Forwarder:
# props.conf and transforms.conf on HF
[source::/var/log/syslog]
TRANSFORMS-anonymize = mask_ssn
# transforms.conf
[mask_ssn]
REGEX = (\d{3}-\d{2}-\d{4})
FORMAT = XXX-XX-XXXX
DEST_KEY = _raw
The Cluster Manager is the control node in clustered Splunk deployments. Its main role is to manage Indexer and Search Head clusters.
Key Features:
How It Works:
Best Practices:
The Deployment Server simplifies managing configurations for Splunk instances, especially forwarders.
How It Works:
Setting Up Deployment:
Define server classes to group forwarders with similar configurations.
Example:
# serverclass.conf
[serverClass:LinuxServers]
whitelist.0 = linux_server_*
app.0 = linux_inputs
Monitoring Forwarders:
Let’s revisit the data pipeline stages with more details and examples.
Collects raw data from various sources:
Example: Monitoring a log file.
# inputs.conf
[monitor:///var/log/apache/access.log]
index = web_logs
sourcetype = apache_access
Tokenizes raw data into events and assigns metadata.
Key Parsing Rules:
Example of Field Extraction:
192.168.1.1 - - [01/Jan/2025:12:00:00 +0000] "GET /index.html HTTP/1.1" 200192.168.1.1GET200Writes events into index buckets for efficient storage and retrieval.
Example of Index Configuration:
# indexes.conf
[web_logs]
homePath = $SPLUNK_DB/web_logs/db
coldPath = $SPLUNK_DB/web_logs/colddb
frozenTimePeriodInSecs = 2592000 # 30 days
Allows users to query and visualize data.
Example SPL Query:
index=web_logs sourcetype=apache_access | stats count by status
Installing Splunk involves downloading and configuring it on your preferred operating system. Let’s dive into the process.
Download Splunk:
Install Splunk:
Windows:
.msi installer.Linux:
For .tgz:
tar xvzf splunk-<version>-Linux-x86_64.tgz -C /opt
cd /opt/splunk/bin
./splunk start --accept-license
For .deb or .rpm:
sudo dpkg -i splunk-<version>-Linux-x86_64.deb
sudo service splunk start
Mac:
.dmg file and drag Splunk to the Applications folder.Initial Setup:
http://<hostname>:8000.adminchangeme (prompted to change on first login).Managing Splunk services is crucial for ensuring uptime and applying updates or configuration changes. This can be done via Splunk Web or the CLI.
Starting Splunk:
./splunk start
Use this command to start the Splunk services after installation or a shutdown.
Stopping Splunk:
./splunk stop
Stops Splunk safely. Use before applying significant configuration changes.
Restarting Splunk:
./splunk restart
Applies new configurations by restarting the service.
Checking Status:
./splunk status
Shows whether Splunk is currently running.
Monitoring system health ensures that Splunk components are running optimally. Use built-in tools and dashboards to track performance and resolve issues.
$SPLUNK_HOME/var/log/splunk/splunkd.log$SPLUNK_HOME/var/log/splunk/metrics.logShow license status:
./splunk show license-status
List configured indexes:
./splunk list index
View forwarder status:
./splunk list forward-server
Troubleshooting is a vital skill for a Splunk administrator. Here are common issues and solutions:
Splunk Service Fails to Start
Cause: Low memory, corrupted configurations, or port conflicts.
Fix:
Check splunkd.log for error messages.
Verify the port (default: 8000) isn’t in use by another process:
netstat -tuln | grep 8000
High CPU or Memory Usage
tstats or summary indexing.props.conf and transforms.conf.Forwarder Not Sending Data
Cause: Incorrect outputs.conf or network issues.
Fix:
Verify forwarder connectivity:
./splunk list forward-server
Check splunkd.log on the forwarder for errors.
License Warnings
Cause: Exceeding daily indexing limits.
Fix:
Monitor license usage:
./splunk show license-status
Reduce data ingestion by filtering unnecessary logs.
btool:
Validates and debug configuration files.
Example:
./splunk btool inputs list --debug
diag:
Collects diagnostic information for troubleshooting:
./splunk diag
Efficient Splunk configurations can significantly improve performance.
*) at the start of search terms.Limit the scope of monitored files using whitelists and blacklists in inputs.conf.
Compress forwarded data to reduce network usage:
[tcpout]
compressed = true
A company wants to monitor server logs to identify errors, warnings, and system health metrics.
Install the Universal Forwarder on each server.
Configure the inputs.conf file to monitor server log files:
[monitor:///var/log/syslog]
disabled = false
index = server_logs
sourcetype = syslog
On the Indexer, create a new index for server logs in indexes.conf:
[server_logs]
homePath = $SPLUNK_DB/server_logs/db
coldPath = $SPLUNK_DB/server_logs/colddb
frozenTimePeriodInSecs = 2592000 # Retain for 30 days
Use the Search Head to create a search for warnings:
index=server_logs sourcetype=syslog "warning"
You can now monitor real-time server warnings and set up alerts for critical events.
Your organization wants to track website traffic to identify popular pages, response times, and errors.
Configure the web server to forward access logs to Splunk using a Universal Forwarder.
Define an input in inputs.conf:
[monitor:///var/log/apache/access.log]
disabled = false
index = web_traffic
sourcetype = apache_access
Create a new index for web logs:
[web_traffic]
homePath = $SPLUNK_DB/web_traffic/db
coldPath = $SPLUNK_DB/web_traffic/colddb
frozenTimePeriodInSecs = 2592000 # Retain for 30 days
Build a search to calculate page visit counts:
index=web_traffic sourcetype=apache_access | stats count by uri_path
You can visualize popular pages and response patterns in dashboards.
Install a Universal Forwarder on a test server.
Configure inputs.conf to monitor a sample log file.
Configure outputs.conf to forward data to an Indexer:
[tcpout]
defaultGroup = default-autolb-group
[tcpout:default-autolb-group]
server = 192.168.1.10:9997
Validate the setup using the CLI:
./splunk list forward-server
Run a search query to find errors:
index=server_logs sourcetype=syslog "error"
Save the search as a report.
Add the report to a dashboard:
Original query:
index=web_traffic sourcetype=apache_access | stats count by uri_path
Optimized query using tstats:
| tstats count where index=web_traffic by uri_path
Compare performance metrics:
tstats for high-performance searches.Check the forwarder’s splunkd.log for errors.
Verify data is reaching the Indexer:
./splunk list forward-server
Ensure inputs.conf and outputs.conf are correctly configured.
index=*).A distributed search environment separates the functions of searching and indexing to improve scalability and performance. Here's a detailed look at its components and configurations.
Search Head:
Indexer:
Forwarders:
Deployment Server:
Connecting Search Heads to Indexers:
Configure the Search Head to recognize Indexers in a distributed environment.
Add Indexers using distsearch.conf:
[distributedSearch]
servers = 192.168.1.10:8089, 192.168.1.11:8089
Enabling Distributed Search:
8089).Validating Search Peers:
Testing Distributed Search:
Run a query from the Search Head that accesses data stored on the Indexers:
index=web_logs sourcetype=apache_access | stats count by status
Indexer Clustering ensures data availability and fault tolerance by replicating data across multiple Indexers.
Replication Factor (RF):
Search Factor (SF):
Cluster Manager:
Enable Indexer Clustering:
On each Indexer, configure server.conf:
[clustering]
mode = slave
master_uri = https://192.168.1.1:8089 # Cluster Manager
replication_factor = 3
search_factor = 2
Configure the Cluster Manager:
On the Cluster Manager, configure server.conf:
[clustering]
mode = master
replication_factor = 3
search_factor = 2
Restart Splunk services.
Monitor Cluster Status:
Parsing and transformations allow you to manipulate raw data during ingestion. These steps occur in the Parsing Stage of the data pipeline.
Sourcetypes:
Define how Splunk processes and tokenizes incoming data.
Example:
[apache_access]
TIME_FORMAT = %d/%b/%Y:%H:%M:%S %z
MAX_TIMESTAMP_LOOKAHEAD = 32
Field Extraction:
192.168.1.1 - - [01/Jan/2025:12:00:00] "GET /index.html" Fields:192.168.1.1GET/index.htmlprops.conf:
Controls how data is parsed and indexed.
Example:
[source::/var/log/apache/access.log]
TRANSFORMS-anonymize = mask_ssn
transforms.conf:
Defines rules for modifying data.
Example:
[mask_ssn]
REGEX = (\d{3}-\d{2}-\d{4})
FORMAT = XXX-XX-XXXX
DEST_KEY = _raw
Configure props.conf:
[source::/var/log/app.log]
TRANSFORMS-anonymize = mask_credit_card
Configure transforms.conf:
[mask_credit_card]
REGEX = \b\d{16}\b
FORMAT = XXXX-XXXX-XXXX-XXXX
DEST_KEY = _raw
Verify the changes:
Splunk’s licensing model is based on daily indexed volume. You purchase a license that allows you to ingest a certain amount of data per day (e.g., 10 GB/day).
Enforcement kicks in when:
If you exceed your license limit on any day, Splunk will issue a license warning.
If you exceed it on 3 separate days in a 30-day window, you will enter a license violation state.
Search functionality is disabled for non-admin users.
Ingesting data still works, but search access is restricted.
A warning banner is shown in Splunk Web.
You can resolve a violation by:
Purchasing more license capacity.
Waiting for the 30-day window to roll forward (older violations expire).
Reducing data ingestion.
Splunk Web:
SPL:
index=_internal source=*license_usage.log* type="Usage"
| stats sum(b) AS bytes by idx, sourcetype
| eval GB=round(bytes/1024/1024/1024, 2)
KOs are user-defined entities that enhance Splunk’s search and visualization capabilities.
Saved Searches:
Scheduled or manual searches saved with a name.
Can be used for dashboards, alerts, or reports.
Macros:
Reusable snippets of SPL, stored with parameters.
Used to simplify complex searches.
Event Types:
Tags for events that match certain search conditions.
Used for classifying data semantically (e.g., failed_logins, user_logins).
Go to Settings > Knowledge:
While more relevant for Power Users, admins must manage sharing, permissions, and app context for these objects.
Splunk reads configuration files from multiple locations with a defined priority order.
$SPLUNK_HOME/etc/system/local/
$SPLUNK_HOME/etc/apps/<app_name>/local/
$SPLUNK_HOME/etc/apps/<app_name>/default/
$SPLUNK_HOME/etc/system/default/
local overrides default
system overrides apps
If the same setting appears in multiple locations, Splunk uses the one with the highest priority.
Example:
props.conf in system/local will override one in app_name/default.Use the following command:
splunk cmd btool props list --debug
It shows the active configuration along with the source file.
Splunk’s Web interface provides intuitive access to most admin functions under the Settings menu.
Indexes:
Location: Settings > Indexes
You can create, edit, and delete indexes here.
Data Inputs:
Location: Settings > Data Inputs
Used to add new data sources (files, ports, scripts, HEC, etc.)
Forwarder Management (if using Deployment Server):
Location: Settings > Forwarder Management
View and manage connected forwarders, server classes, and deployed apps.
Distributed Search:
Location: Settings > Distributed Search
Configure search peers and replication settings.
Users & Authentication:
Location: Settings > Access Controls
Manage users, roles, and authentication methods (LDAP, SAML, etc.)
Always confirm the scope (App or Global) when editing configurations via Splunk Web.
Use role-based access controls to limit what each user or role can manage.
Which Splunk component stores indexed data and makes it searchable?
The indexer.
The indexer is the Splunk component that receives parsed events, writes raw data and index files to disk, and makes the data searchable. In a typical distributed deployment, forwarders collect data, indexers process and store it, and search heads send search requests to indexers and combine the results. A common mistake is to assume that the search head stores the production event data. Its primary role is search coordination, not long-term indexed storage.
Demand Score: 80
Exam Relevance Score: 92
What is the primary role of a search head in Splunk?
To dispatch searches and present results.
A search head accepts user searches, sends those searches to the appropriate indexers or search peers, and then merges and presents the returned results. It is the user-facing search coordination layer in a distributed Splunk deployment. It does not normally perform the core indexing role for production machine data in that architecture. Confusing the search head with the indexer is one of the most common admin-level mistakes.
Demand Score: 78
Exam Relevance Score: 91
What is the primary function of a Universal Forwarder?
To collect data and forward it to another Splunk instance.
A Universal Forwarder is designed to collect data from files, directories, and other supported inputs, then forward that data onward, usually to indexers. In the pipeline mapping discussed in Splunk documentation and community guidance, the Universal Forwarder participates in the input phase, while downstream systems handle parsing and indexing. This makes the Universal Forwarder lightweight and suitable for source systems where low resource consumption matters.
Demand Score: 74
Exam Relevance Score: 93
Can a search head also be configured to forward its own internal logs?
Yes.
A search head can be configured to forward its own local or internal logs to indexers. In practice, this is a common pattern in distributed deployments so that internal operational data is centralized on indexers instead of remaining only on the search head. This does not change the search head’s main role as a search coordinator, but it does mean the instance can also act as a forwarding source for its own local data.
Demand Score: 68
Exam Relevance Score: 88