Search Head Clustering

Search Head Clustering Detailed Explanation

Search Head Clustering enables multiple Search Heads to work together as a single logical unit. It ensures that search configuration, knowledge objects, and scheduled jobs are consistent and synchronized across all members, and provides high availability in case of failure.

1. Purpose

The main goals of Search Head Clustering (SHC) are:

High Availability: Ensures search interface availability even if one or more Search Heads go offline.
Configuration Consistency: Keeps apps, knowledge objects, and configurations the same across all members.
Horizontal Scaling: Allows you to add more Search Heads to support more users and workloads.

This is critical for enterprise environments where many users interact with Splunk dashboards, alerts, and searches simultaneously.

2. SHC Components

A functional SHC has three primary components:

Cluster Members

These are the Search Head nodes that participate in the cluster.
Must be a minimum of 3 members for quorum-based captain election to work properly.
Typically deployed behind a load balancer to distribute user traffic.

Captain

One Search Head is elected as the Captain.
The Captain:
- Coordinates scheduled searches
- Manages knowledge object replication
- Handles configuration synchronization
If the Captain goes offline, a new captain is automatically elected.

Deployer

The Deployer is a separate Splunk instance (not part of the SHC itself).
It is used to push configurations and apps to all SHC members.
Required for maintaining centralized app management.

3. SHC Capabilities

SHC introduces several features that improve performance, consistency, and resilience.

Shared Scheduling

Scheduled searches (like alerts and report updates) are coordinated by the Captain.
Ensures that each search runs only once, even if defined on multiple members.

KV Store Replication

The KV Store (Key-Value Store) is used for storing:
- Lookups
- App state
- Global variables
KV Store is replicated across all members for consistency.

User Load Distribution

A load balancer can route user traffic across all SHC members.
Reduces load on any single node.
Improves user interface responsiveness and search concurrency.

4. Deployment Procedure

To set up an SHC, you must configure each member and bootstrap the cluster. The process involves:

a. Edit `server.conf` on each Search Head:

[shclustering]  
disabled = 0  
pass4SymmKey = <shared-secret>  
replication_port = <port>

pass4SymmKey is the shared secret for secure communication between nodes.
replication_port is used for KV Store and configuration replication.

b. Bootstrap the Captain:

Run this on one Search Head only:

splunk bootstrap shcluster-captain -servers_list "<list_of_all_members>"

This initializes the SHC and elects the first Captain.

c. Add Members to the Cluster:

Run this command on all other members:

splunk add shcluster-member -current_member_uri https://<existing_member>:<mgmt_port>

5. Configuration with Deployer

The Deployer pushes app bundles to all SHC members using the following process.

Push App Bundles:

Use this command from the Deployer:

splunk apply shcluster-bundle -target https://<SH>:<mgmt_port> -auth admin:pass

This pushes all configurations from the Deployer’s $SPLUNK_HOME/etc/shcluster/apps/ to all SHC members.

Important Notes:

The Deployer does not manage user-specific settings like saved searches or dashboard changes stored in users’ directories.
It only manages shared app-level configurations.

6. Troubleshooting SHC

When issues arise, the following commands and practices help diagnose and resolve problems.

View Cluster Status:

splunk show shcluster-status

Displays the captain, replication status, and scheduling sync status.

List Cluster Members:

splunk list shcluster-members

Shows all active members and their roles.

Synchronize System Clocks:

All SHC members must have NTP time synchronization.
Time mismatches can cause captain election failures and replication errors.

Resolve Split-Brain Scenarios:

If two nodes believe they are captain, you may need to:
- Restart affected members
- Force a captain re-election using CLI tools

7. Best Practices

To ensure your SHC is reliable, secure, and maintainable, follow these best practices:

Use an Odd Number of Members

Always deploy 3, 5, or 7 Search Heads to ensure a proper voting quorum for captain election.
Even numbers increase the risk of a tie (split-brain).

Never Edit Configurations Directly on Members

Always use the Deployer to manage shared app configurations.
Direct edits can cause inconsistencies and replication conflicts.

Secure Inter-Node Communication

Use TLS encryption and secure passwords (pass4SymmKey) for all communication.
Restrict access to management ports and ensure firewall policies are in place.

Summary: Search Head Clustering (SHC)

Topic	Key Point
Purpose	High availability, scalability, and configuration consistency
Components	Cluster Members, Captain, Deployer
Scheduling	Shared across cluster, coordinated by Captain
KV Store	Replicated across nodes for lookups and app state
Deployment Process	Configure `server.conf`, bootstrap captain, add members
Configuration Management	Use Deployer to push shared apps; no direct edits on members
Troubleshooting Tools	`splunk show shcluster-status`, `splunk list shcluster-members`
Best Practices	Use odd number of members, secure with TLS, avoid direct edits

Search Head Clustering (Additional Content)

1. Search Affinity in SHC

Search Affinity allows Search Heads in a cluster to prioritize certain Indexers when executing search queries. This helps:

Reduce latency between Search Head and Indexer
Optimize search performance in large or geographically distributed environments

Configuration Location:

Defined in distsearch.conf
Admins can manually control which Search Head connects to which Indexer first by setting search peer priorities or using affinity tagging

Exam Relevance:

You may be asked how to optimize search performance in SHC using indexer proximity or affinity, especially in hybrid or cloud environments.

2. Captain Election and Quorum (Raft Protocol)

Search Head Clustering relies on a distributed consensus protocol, specifically the Raft protocol, to elect a captain and maintain consistent cluster behavior.

Key Mechanics:

A quorum is required for any captain to be elected
A quorum is defined as (N / 2) + 1, where N is the number of SHC members
If quorum is lost (for example, due to a network split or power outage), SHC loses its ability to schedule searches or replicate knowledge objects

Best Practice:

Always deploy an odd number of members (3, 5, 7) to avoid split-vote scenarios.

Exam Tip:

Expect scenario-based questions like: “What happens when quorum is lost in a 4-node SHC?” Answer: Captain cannot be elected; scheduled searches may fail.

3. Purpose of `shclustering.conf`

In addition to server.conf, the file shclustering.conf is used to:

Define the logical identity of the SHC
Set labels that help distinguish different SHC deployments in shared environments

Example Configuration:

[general]  
shcluster_label = shcluster_prod

This file is particularly useful in environments where multiple SHCs exist (such as staging and production) and must be clearly distinguished.

Exam Relevance:

You may be asked where to configure SHC identification or labeling.

4. Limitations of the Deployer

The Deployer is responsible for pushing shared app configurations to all SHC members from:

$SPLUNK_HOME/etc/shcluster/apps/

However, it cannot deploy or manage:

User-level content, such as dashboards or saved searches located in .../users/ directories
Runtime changes, such as those made directly in the UI
Search Head-specific local configurations, such as local.meta on individual SHC members

Best Practices:

Use Git or another version control system to manage deployment content
Test deployments in a staging SHC before applying to production

Exam Relevance:

You may face questions like: “Why didn’t a deployed app update user dashboards?” Correct answer: The Deployer does not manage user-level content.

5. Unsupported Features in SHC

Search Head Clustering imposes several restrictions:

Cannot function as a Deployment Server (DS): SHC members should never be used to manage forwarders
Manual commands like splunk rebuild should not be run on individual SHC members
Early versions (prior to 6.6) do not support KV Store clustering

All operations affecting cluster-wide behavior must be coordinated through the Deployer or the SHC management framework.

Common Mistakes to Avoid:

Assigning Deployment Server roles to SHC members
Editing apps directly on a Search Head instead of through the Deployer

Exam Tip:

You may encounter multiple-choice exclusion questions such as: “Which of the following is not supported in SHC?”

6. Key Logs and Troubleshooting in SHC

When problems arise in SHC—such as delayed knowledge object replication or missing dashboards—start by checking key log files:

shclustering.log: Tracks SHC-related coordination, bundle replication, and leadership status
splunkd.log: General log file that may include election failures, peer sync issues, or KV Store errors

Common Problems and How to Detect:

Captain election failure: Look for Raft election logs or quorum warnings
Bundle replication delay: Messages like bundle push timed out may appear in shclustering.log
KV Store inconsistency: Detected by differences in app state or lookup results between members

Exam Relevance:

You may be asked: “Where can you find evidence that bundle replication has failed?” Correct answer: shclustering.log

Summary

Area	Enhancement
Search Affinity	Optimizes search-to-indexer routing via `distsearch.conf`
Captain Election	Based on Raft; requires quorum; odd-number SHC member count recommended
shclustering.conf	Defines cluster label and metadata for multi-cluster identification
Deployer Limitations	Does not push user content or UI-based changes
Unsupported Features	DS role, manual low-level commands, unsupported KV Store behavior
Logs for Troubleshooting	`shclustering.log` and `splunkd.log` provide key insight into cluster health

Shopping cart

Subtotal:

SPLK-3003 Search Head Clustering

Detailed list of SPLK-3003 knowledge points

Search Head Clustering Detailed Explanation

1. Purpose

2. SHC Components

Cluster Members

Captain

Deployer

3. SHC Capabilities

Shared Scheduling

KV Store Replication

User Load Distribution

4. Deployment Procedure

a. Edit server.conf on each Search Head:

b. Bootstrap the Captain:

c. Add Members to the Cluster:

5. Configuration with Deployer

Push App Bundles:

Important Notes:

6. Troubleshooting SHC

View Cluster Status:

List Cluster Members:

Synchronize System Clocks:

Resolve Split-Brain Scenarios:

7. Best Practices

Use an Odd Number of Members

Never Edit Configurations Directly on Members

Secure Inter-Node Communication

Summary: Search Head Clustering (SHC)

Search Head Clustering (Additional Content)

1. Search Affinity in SHC

2. Captain Election and Quorum (Raft Protocol)

3. Purpose of shclustering.conf

4. Limitations of the Deployer

5. Unsupported Features in SHC

6. Key Logs and Troubleshooting in SHC

Summary

Frequently Asked Questions

a. Edit `server.conf` on each Search Head:

3. Purpose of `shclustering.conf`