Deploying Splunk

Deploying Splunk Detailed Explanation

1 Splunk Deployment Types

What does "deployment" mean in Splunk?

In Splunk, "deployment" refers to how the system is installed and organized across one or more machines. It determines how data flows through the system, how it's stored, and how users access and analyze it. There are two main types of deployment: standalone and distributed.

1.1 Standalone Deployment

A standalone deployment means all Splunk functions are installed on a single machine. This machine handles:

Collecting data (data input)
Parsing and indexing data (storing it for search)
Running searches and reports
Displaying results in dashboards and visualizations

This type of setup is simple and quick to install.

When is standalone deployment suitable?

Learning and training environments
Personal testing or development use
Small businesses with limited data volume

Advantages of standalone deployment:

Easy to set up and manage
No need for network configuration between components
Fewer system resources required

Disadvantages of standalone deployment:

Not designed for large data volumes
Limited performance and scalability
Not suitable for enterprise production use

1.2 Distributed Deployment

A distributed deployment separates the different functions of Splunk across multiple machines. Each machine has a specific role, and they work together as a system.

This model is used in most production environments because it supports scalability, performance, and fault tolerance.

Main roles in a distributed deployment:

Forwarders – These are lightweight agents installed on data sources (e.g., servers, network devices). They collect and send data to indexers.
Indexers – These systems receive, parse, and store the data. They also respond to search requests by providing results.
Search Heads – These machines let users run searches, build dashboards, set alerts, and view reports. They act as the interface for users.

Why use a distributed setup?

To handle larger data volumes
To improve reliability and performance
To isolate and scale specific functions as needed

When is distributed deployment suitable?

Medium to large businesses
Environments with high data ingestion rates
Teams with multiple users querying data

Comparison of Standalone and Distributed Deployment

Category	Standalone Deployment	Distributed Deployment
Number of machines	One	Multiple
Complexity	Low	Medium to high
Performance	Limited	High
Scalability	Poor	Excellent
Use case	Testing, learning	Production, large-scale environments
Data handling	Basic	Handles high volume and concurrency

2. Deployment Components

In a distributed Splunk environment, different components (or roles) are installed on different machines. Each component has a specific responsibility. Understanding these roles is essential before you attempt any Splunk deployment or configuration.

2.1 Universal Forwarder (UF)

A Universal Forwarder is a lightweight Splunk agent installed on data source systems (such as application servers, web servers, or databases). Its only job is to collect and forward raw data to a central Splunk system (usually to Indexers).

Key characteristics:

Small footprint with minimal resource usage
Cannot parse or transform data
No web interface or dashboards
Typically installed on many servers across an organization

Use case: Best suited for sending logs or metrics from servers in a production environment to Indexers for further processing.

2.2 Heavy Forwarder (HF)

A Heavy Forwarder is a full Splunk instance used to collect, parse, and forward data. Unlike the Universal Forwarder, the Heavy Forwarder can:

Perform data transformation using configuration files (props.conf and transforms.conf)
Filter or route data to different destinations
Support inputs that need scripting or parsing

Use case: Useful when the data needs to be modified or routed before it reaches the Indexer. For example, masking sensitive data or splitting events.

2.3 Deployment Server (DS)

The Deployment Server is a special role used to manage and distribute configuration files and apps to multiple Universal Forwarders (and optionally Heavy Forwarders).

Functions include:

Central management of input configurations (inputs.conf, etc.)
Grouping forwarders into "server classes" based on IP, hostnames, etc.
Automatically deploying changes when needed

Use case: When you have hundreds or thousands of forwarders and you want to update or control them from a single place.

2.4 Indexer

An Indexer is the core of the Splunk backend. It performs several vital functions:

Receives data from Forwarders or inputs
Parses and processes the raw data
Creates and stores event indexes for search
Responds to search requests from Search Heads

Key role: It stores both the raw data and the index metadata that allows fast searching later.

Use case: All production environments use one or more Indexers depending on data volume and redundancy requirements.

2.5 Search Head (SH)

A Search Head provides the user interface to Splunk. It is where users:

Write and run searches using SPL (Search Processing Language)
Create dashboards, reports, and alerts
View search results and visualizations

The Search Head does not store data. It simply queries Indexers (or a cluster of Indexers) to retrieve results.

Use case: In large environments, Search Heads are separated from Indexers so user searches don't slow down indexing.

2.6 Cluster Master (Manager Node)

The Cluster Master, also called the Manager Node, is used when you deploy Indexer Clustering. It performs administrative tasks such as:

Managing cluster configurations
Monitoring the health of peer nodes (Indexers)
Coordinating data replication

It does not index or search data.

Use case: Required in environments that use clustered Indexers for high availability and data redundancy.

2.7 Deployer

A Deployer is used when you set up a Search Head Cluster. Its job is to:

Push configuration bundles (apps, settings) to all Search Head cluster members
Ensure consistency across the cluster

Use case: Ensures all Search Heads in a cluster share the same configurations and apps, avoiding manual updates to each member.

2.8 License Master

The License Master manages Splunk’s license usage across all components in a deployment.

Responsibilities:

Tracks how much data is ingested per day
Issues license warnings or violations if limits are exceeded
Allocates license volumes across environments or departments

Use case: In medium to large environments, a centralized License Master ensures compliance and better monitoring of license usage.

3. Installation Methods

Splunk Enterprise can be installed on various platforms, including Linux, Windows, Docker containers, and Cloud platforms. Each method has its own advantages and is used in different environments depending on scale, automation needs, or operating system preferences.

3.1 Installing Splunk on Linux

Linux is the most common platform for deploying Splunk in production. It provides better performance and flexibility for configuration.

Installation formats available:

.tgz (tarball file): Best for manual installation and custom directory paths.
.rpm (Red Hat-based systems): For package-managed installations using yum or dnf.

Installation steps using tarball (.tgz):

Download Splunk from the official site.
Move the file to your target directory.
Extract using:
tar -xvzf splunk-<version>-<build>-Linux-x86_64.tgz -C /opt
Change directory:
cd /opt/splunk/bin
Start Splunk for the first time and set admin password:
./splunk start --accept-license

Advantages:

Offers full control over the installation path and files.
Widely used in production environments.
Easier to integrate with enterprise monitoring tools.

3.2 Installing Splunk on Windows

Splunk provides a simple graphical installer for Windows environments.

Installation steps:

Download the .msi or .exe installer from the Splunk website.
Run the installer and follow the GUI wizard.
Choose the installation path and set admin credentials.
After installation, access Splunk Web at:
http://localhost:8000

Windows-specific features:

Native integration with Event Logs, WMI, Active Directory.
Can collect data using prebuilt Windows apps and add-ons.

Use cases:

Suitable for test environments or for collecting Windows-native logs.

3.3 Installing Splunk with Docker and Kubernetes

This method is used for containerized or cloud-native deployments. It is common in CI/CD pipelines or when you want to automate deployment with infrastructure as code.

Docker Installation:

Pull the official image:
docker pull splunk/splunk:latest
Run a container with environment variables:
docker run -d -p 8000:8000 -e SPLUNK_START_ARGS="--accept-license" -e SPLUNK_PASSWORD="changeme" splunk/splunk

Kubernetes Installation:

Use Helm charts provided by Splunk.
Requires knowledge of Kubernetes and configuration files.
Enables rapid scaling and orchestration.

Use cases:

Development, testing, and cloud-native production environments.
Ideal for organizations using microservices or DevOps.

3.4 Splunk Cloud Platform

Splunk Cloud is a fully managed Splunk service hosted by Splunk or on major cloud providers (AWS, GCP, Azure). It eliminates the need for infrastructure management.

Key features:

No installation required.
Managed scaling, backups, and updates by Splunk.
Connect to Splunk Cloud using Forwarders or HEC.

Use cases:

Organizations that want to use Splunk without managing hardware.
Fast onboarding for analytics teams without DevOps involvement.

Limitations:

Less control over backend configurations.
Certain custom apps or configurations may not be supported.

Comparison of Installation Methods

Method	Platform	Best for	Notes
Linux (.tgz/.rpm)	Server Linux	Production, performance	Most flexible and powerful
Windows (.exe/.msi)	Windows Server	Small-scale, Windows integration	Easy GUI installation
Docker	Any (container)	Testing, DevOps, automation	Requires Docker knowledge
Kubernetes	Cloud-native	Enterprise DevOps teams	Scalable, complex to manage
Splunk Cloud	Fully managed	Zero-maintenance, rapid deployment	Managed by Splunk, fast but less control

4. Configuration Best Practices

4.1 Use Dedicated Directories for Logs, Configuration Files, and Indexed Data

By default, Splunk stores all of its data and configurations under the main installation directory ($SPLUNK_HOME). However, in production environments, it is a best practice to:

Store indexed data (hot, warm, cold buckets) on high-performance storage with sufficient capacity.
Place log files (such as _internal logs) on a separate disk if possible, to avoid performance impact.
Separate configuration files (in the etc/ directory) from data directories to improve clarity and backups.

Benefits:

Easier disaster recovery
Better performance tuning
Safer during upgrades or migrations

4.2 Separate `etc/system/local` and `etc/apps/` for Configuration Hierarchy

Splunk has a layered configuration system, meaning configuration files can exist in different locations, each with a specific priority. Two key directories are:

etc/system/local: This is the highest precedence, but it's only for critical overrides.
etc/apps/<your_app>: Preferred location for most custom configurations and knowledge objects (dashboards, searches, inputs, etc.)

Why use apps instead of system/local?

Apps are modular and portable
Easy to version-control and deploy across environments
Compatible with Deployment Server, Search Head Clustering, and Deployer

Example: If you want to configure a custom log input, create an app like TA_custom_inputs, and place inputs.conf inside etc/apps/TA_custom_inputs/local/.

4.3 Ensure Time Synchronization (NTP) Across All Nodes

Time consistency is critical in Splunk deployments because:

Timestamps are used to organize, index, and search data.
Searches over time ranges can miss data if the time is not aligned across Indexers, Forwarders, and Search Heads.
Clustering (Indexer/Search Head) relies on synchronized logs for replication and troubleshooting.

Best practice: Configure all systems to use a reliable Network Time Protocol (NTP) server.

4.4 Scale Horizontally by Adding More Indexers or Search Heads

In production environments, you should scale horizontally, not vertically.

Horizontal scaling: Add more Indexers or Search Heads as data volume or user demand increases.
Vertical scaling: Adding more CPU/RAM to a single instance (helpful but has limits).

Examples:

If indexing becomes slow: Add more Indexers.
If search performance drops during peak hours: Add more Search Heads.
Use load balancing between Universal Forwarders and Indexers for better distribution.

Splunk supports both scaling methods, but horizontal scaling is better for long-term growth.

Other Useful Tips for Configuration Management

Use version control (like Git) to track and manage configuration changes across teams.
Test changes in staging or dev environments before applying them to production.
Keep a change log for all manual edits made to .conf files.
Validate your configuration changes using Splunk's btool command to debug configuration merging:
```
splunk btool inputs list --debug
```
Organize apps clearly:
- Use prefixes like TA_ for Technical Add-ons (data inputs only)
- Use SA_ for shared libraries or utilities
- Use app_ or business names for user-facing apps

Summary of Best Practices

Practice Area	Best Practice Summary
Directory Organization	Separate logs, configs, and indexed data
Configuration Hierarchy	Use `etc/apps/` instead of `system/local`
Time Management	Sync all nodes with NTP servers
Scaling	Add Indexers/Search Heads as needed for performance
Versioning and Testing	Use Git, test in dev, and document all changes
Validation Tools	Use `btool` and the Monitoring Console to check config status

Deploying Splunk (Additional Content)

1. Boundary Between Search Head Clustering and Indexer Clustering

Search Head Clustering (SHC) and Indexer Clustering serve fundamentally different purposes, though both are deployed in distributed Splunk architectures.

Search Head Clustering (SHC):

SHC is a high-availability solution designed to support UI-level redundancy, load balancing, and search distribution.
It focuses on coordinating scheduled searches, replicating knowledge objects, and ensuring that multiple search heads appear as one logical interface to the user.
SHC uses a Captain to coordinate scheduled searches and a Deployer to distribute configuration and apps.

Indexer Clustering:

Indexer Clustering is primarily focused on data-level redundancy and high availability of indexed data.
It ensures that multiple copies of raw data and index files are maintained across a cluster of peer indexers, governed by a Cluster Manager.
Key parameters such as Replication Factor (RF) and Search Factor (SF) control the number of data copies and their searchability.

Key Comparison (for exam memory aid):

Feature	SHC	Indexer Clustering
Focus	Search interface and metadata	Raw data redundancy
Key Coordinator	Captain	Cluster Manager (Master Node)
Configuration Tool	Deployer	Configuration files (via CLI or UI)
Data Stored	None (search head only)	Raw data and indexes
Primary Benefit	UI redundancy and search scaling	Data resiliency and durability

2. Forwarder Management Security Control

While the Deployment Server (DS) is an effective tool to manage configurations for Universal Forwarders (UFs), security and control over which clients are allowed to connect is critical, especially in large-scale environments.

Access Control Mechanism:

The following parameters can be configured in serverclass.conf on the Deployment Server to restrict or permit UF access:

whitelist.: Allows matching clients based on clientName, host, or IP address.
blacklist.: Explicitly denies access from specified clients.

Examples:

[serverClass:windows_clients]
whitelist.0 = host::win*

[serverClass:restricted]
blacklist.0 = ip::10.0.0.50

This access control helps prevent unauthorized forwarders from enrolling and receiving apps or configurations.

3. License Enforcement Details

The Splunk License Master is responsible for monitoring the volume of daily indexed data and enforcing license limits.

License Violations:

If your Splunk environment exceeds the licensed daily indexing volume for five non-consecutive days in a 30-day window, you will enter a "License Violation" state.
In violation mode:
- User searches (non-admin) are disabled.
- Only admin-level accounts can run searches for remediation.
Violations do not delete data, but will halt scheduled reports and alerting for regular users.

Monitoring:

Use the Monitoring Console (MC) under License Usage to visualize indexed volume per sourcetype, index, or forwarder.

Alternatively, monitor the log file:

$SPLUNK_HOME/var/log/splunk/license_usage.log

4. Splunk Cloud Limitations

While Splunk Cloud provides a scalable and managed solution, it introduces restrictions compared to on-prem Splunk Enterprise.

Common Limitations:

No direct access to shell or file system, which restricts:
- Running custom Python scripts not validated by Splunk.
- Using btool for deep configuration inspection.
Limited access to certain apps or add-ons, especially those requiring:
- File system interaction
- Shell scripting or custom binaries
Apps deployed must go through App Vetting to be accepted in Splunk Cloud.

Exam Tip:

Questions may ask “Which feature is not available in Splunk Cloud?”, so remember that low-level debugging and certain scripted inputs are not supported.

5. Compatibility of Deployment Server with SHC

A common exam trap is the misuse of Deployment Server in Search Head Clustering.

Critical Rule:

Deployment Server is NOT supported for SHC members.
You must use a Deployer to push configuration bundles (apps, dashboards, saved searches) to SHC nodes.

If you try to manage SHC members using a Deployment Server:

Configuration may become inconsistent across the cluster.
Captain election and knowledge object replication may fail.
Supportability and upgrade paths may be broken.

Summary

Clarify the functional separation of SHC and Indexer Clustering under deployment models.
Emphasize access control for Universal Forwarders via serverclass.conf using whitelist/blacklist.
Reinforce licensing consequences, especially the 5-day violation rule and admin-only search fallback.
Detail Splunk Cloud limitations, especially around btool, unsupported apps, and scripting.
Highlight the incompatibility of Deployment Server with SHC, and the necessity of using a Deployer.

Shopping cart

Subtotal:

SPLK-3003 Deploying Splunk

Detailed list of SPLK-3003 knowledge points

Deploying Splunk Detailed Explanation

1 Splunk Deployment Types

What does "deployment" mean in Splunk?

1.1 Standalone Deployment

1.2 Distributed Deployment

Comparison of Standalone and Distributed Deployment

2. Deployment Components

2.1 Universal Forwarder (UF)

2.2 Heavy Forwarder (HF)

2.3 Deployment Server (DS)

2.4 Indexer

2.5 Search Head (SH)

2.6 Cluster Master (Manager Node)

2.7 Deployer

2.8 License Master

3. Installation Methods

3.1 Installing Splunk on Linux

3.2 Installing Splunk on Windows

3.3 Installing Splunk with Docker and Kubernetes

3.4 Splunk Cloud Platform

Comparison of Installation Methods

4. Configuration Best Practices

4.1 Use Dedicated Directories for Logs, Configuration Files, and Indexed Data

4.2 Separate etc/system/local and etc/apps/ for Configuration Hierarchy

4.3 Ensure Time Synchronization (NTP) Across All Nodes

4.4 Scale Horizontally by Adding More Indexers or Search Heads

Other Useful Tips for Configuration Management

Summary of Best Practices

Deploying Splunk (Additional Content)

1. Boundary Between Search Head Clustering and Indexer Clustering

Search Head Clustering (SHC):

Indexer Clustering:

Key Comparison (for exam memory aid):

2. Forwarder Management Security Control

Access Control Mechanism:

3. License Enforcement Details

License Violations:

Monitoring:

4. Splunk Cloud Limitations

Common Limitations:

Exam Tip:

5. Compatibility of Deployment Server with SHC

Critical Rule:

Summary

Frequently Asked Questions

4.2 Separate `etc/system/local` and `etc/apps/` for Configuration Hierarchy