Distributed Search

Distributed Search Detailed Explanation

Distributed search in Splunk allows large-scale deployments by separating the roles of searching and indexing across multiple components. This guide explains the core components, configuration steps, and optimization tips for setting up and managing a distributed Splunk environment.

1. Distributed Environment Components

A distributed environment comprises three primary components, each with specific roles in the search and indexing process.

1.1 Search Head

Overview:
- The Search Head is the user-facing component where users create and execute search queries, build dashboards, and manage alerts.
- It does not store data but acts as a query manager, forwarding search requests to Indexers and combining results.
Responsibilities:
- Distribute SPL (Search Processing Language) queries to Indexers.
- Aggregate results and present them to the user.
- Manage dashboards, reports, and knowledge objects.
Use Case:
- A company wants multiple users to run queries concurrently without overloading a single server.

1.2 Indexer Cluster

Overview:
- The Indexer Cluster stores and processes raw data, making it searchable. It is the backbone of distributed search.
Key Features:
- Data Storage: Indexers organize data into buckets (hot, warm, cold, frozen).
- High Availability: Replicates data across multiple Indexers for fault tolerance.
- Data Processing: Tokenizes, parses, and stores data during ingestion.
Replication Factor (RF):
- The number of copies of data maintained in the cluster.
- Example: RF=2 ensures two copies of each piece of data are stored.
Search Factor (SF):
- The number of searchable copies maintained in the cluster.
- Example: SF=2 ensures two Indexers have fully searchable copies.

1.3 Search Head Cluster

Overview:
- A Search Head Cluster provides redundancy and load balancing by pooling multiple Search Heads.
- Essential for environments with high query loads or critical uptime requirements.
Key Features:
- Failover: If one Search Head fails, others take over automatically.
- Load Balancing: Distributes user queries across multiple Search Heads.
- Synchronization: Shares knowledge objects (e.g., dashboards, alerts) between Search Heads.

2. Configuration Steps

2.1 Connect Search Head to Indexers

To establish communication between the Search Head and Indexers, use the distsearch.conf file.

Steps:

Locate distsearch.conf:
- File path: $SPLUNK_HOME/etc/system/local/distsearch.conf.
Define Indexers:
- Example configuration:
```
[distributedSearch]
servers = <indexer1_ip>:8089, <indexer2_ip>:8089
```
- Replace <indexer1_ip> and <indexer2_ip> with the actual IPs of the Indexers.
Restart Splunk:
- Apply the changes:
```
./splunk restart
```
Verify Search Peers:
- Go to Settings > Distributed Search > Search Peers in Splunk Web.
- Ensure all Indexers show as "Connected."

2.2 Set Up Replication in Indexer Clusters

Replication ensures high availability and data redundancy in Indexer Clusters.

Steps:

Edit server.conf on Indexers:

Example configuration:

[clustering]
mode = slave
master_uri = https://<cluster_master_ip>:8089
replication_factor = 2
search_factor = 2

Configure the Cluster Master:
- Add this to the Cluster Master’s server.conf:
```
[clustering]
mode = master
replication_factor = 2
search_factor = 2
```
- Restart the Cluster Master:
```
./splunk restart
```
Add Indexers to the Cluster:
- Ensure Indexers are connected to the Cluster Master:
```
./splunk add cluster-master https://<cluster_master_ip>:8089
```
Verify Cluster Status:
- On the Cluster Master, check the status:
```
./splunk show cluster-status
```

3. Optimization Tips

3.1 Use Search Head Pooling

Search Head pooling allows multiple Search Heads to share user-generated objects like dashboards and alerts, reducing resource duplication.

Steps:
1. Configure shared storage accessible by all Search Heads.
2. Edit server.conf on each Search Head to point to the shared storage.

3.2 Enable Parallel Processing for Complex Searches

Purpose:
- Improves performance by breaking down large searches into smaller tasks processed by multiple Indexers.
Steps:
1. Configure limits.conf:
  - Example:
```
[search]
max_searches_per_cpu = 2
```
2. Restart Splunk for changes to take effect:
```
./splunk restar
```

Real-World Scenarios

Scenario 1: Scaling for High Query Load

A company experiences increased search demands, leading to slow query responses. They decide to scale the Search Head capacity using a Search Head Cluster.

Steps:

Set Up a Search Head Cluster:
- Install Splunk on multiple servers to act as Search Heads.

Configure server.conf for the Cluster:

On each Search Head, add:

[shclustering]
mode = member
mgmt_uri = https://<search_head_ip>:8089
replication_port = 8080
pass4SymmKey = <shared_key>

Designate a Deployer:

On the Deployer, configure server.conf:

[shclustering]
mode = deployer
pass4SymmKey = <shared_key>

Push Configuration from the Deployer:
- Deploy updated configurations to Search Heads:
```
./splunk apply shcluster-bundle -target https://<search_head_ip>:8089
```
Verify Cluster Status:
- On any Search Head:
```
./splunk show shcluster-status
```

Scenario 2: Ensuring High Availability for Indexers

An organization wants to ensure data is always available, even if one Indexer fails.

Steps:

Deploy an Indexer Cluster:
- Configure multiple Indexers and a Cluster Master.
Set Replication and Search Factors:
- Ensure replication and search factors are sufficient:
  - Replication Factor (RF): 2
  - Search Factor (SF): 2
Monitor Cluster Health:
- Use the Cluster Master to check cluster status:
```
./splunk show cluster-status
```
Simulate Failover:
- Stop one Indexer and verify that data is still accessible from the cluster.

Scenario 3: Segmenting Data Access for Security

A company needs to restrict access to specific indexes for different departments while maintaining centralized search capability.

Steps:

Define Roles and Index Access:
- Create department-specific roles in Splunk Web:
  - IT → Access to it_logs.
  - Finance → Access to finance_logs.
Map Search Heads to Indexers:
- Edit distsearch.conf to connect Search Heads to relevant Indexers.

Apply Search Filters:

Define search filters in roles.conf:

[role_it]
srchFilter = index=it_logs

[role_finance]
srchFilter = index=finance_logs

Test Permissions:
- Log in as a user with each role and ensure they only see permitted data.

Hands-On Exercises

Exercise 1: Connect a Search Head to Multiple Indexers

Goal: Configure a Search Head to query data from two Indexers.

Steps:

Edit distsearch.conf:

[distributedSearch]
servers = 192.168.1.10:8089, 192.168.1.11:8089

Restart the Search Head:
```
./splunk restart
```
Verify Search Peer Connectivity:
- Go to Settings > Distributed Search > Search Peers and ensure both Indexers show as "Connected."
Run a Query:
- Execute a search that spans both Indexers:
```
index=* | stats count by index
```

Exercise 2: Monitor Cluster Health

Goal: Use the Cluster Master to monitor an Indexer Cluster’s status.

Steps:

Connect to the Cluster Master:
- Use the CLI on the Cluster Master to check status:
```
./splunk show cluster-status
```
Check Replication:
- Verify that all buckets have sufficient replication:
```
./splunk show cluster-status | grep -i replication
```
Inspect Peer Status:
- Ensure all Indexers are listed as active.

Exercise 3: Optimize Search Head Performance

Goal: Enable parallel processing for large queries.

Steps:

Edit limits.conf:
```
[search]
max_searches_per_cpu = 2
```
Restart the Search Head:
```
./splunk restart
```
Run a Complex Query:
- Test performance with a large dataset:
```
index=main | stats count by host
```
Monitor Resource Usage:
- Use the Monitoring Console to track query performance.

Troubleshooting Distributed Search

Common Issues and Solutions

Issue 1: Search Head Fails to Connect to Indexers

Cause:
- Incorrect settings in distsearch.conf.
Solution:
1. Verify the distsearch.conf configuration:
```
splunk cmd btool distsearch list --debug
```
2. Ensure the Indexers are reachable on port 8089.

Issue 2: Data Replication Errors in Indexer Cluster

Cause:
- Insufficient disk space or network issues.
Solution:
1. Check Indexer logs for errors:
```
index=_internal source=*splunkd.log cluster
```
2. Ensure enough disk space is available on all Indexers.

Issue 3: Slow Searches in Distributed Environment

Cause:
- Overloaded Indexers or suboptimal search queries.
Solution:
1. Optimize SPL queries by avoiding wildcards:
```
index=main sourcetype=syslog | stats count by host
```
2. Enable parallel search processing in limits.conf.

Best Practices

Use Load Balancing for Indexers:
- Distribute data ingestion across multiple Indexers to avoid bottlenecks.
Monitor Search Performance Regularly:
- Use the Monitoring Console to track query response times.
Plan Replication Factors:
- Set RF and SF values based on data criticality and resource availability.
Test Failover Scenarios:
- Periodically simulate Indexer or Search Head failures to ensure high availability.

Distributed Search (Additional Content)

Distributed Search enables Splunk to scale horizontally by separating search functions from indexing, facilitating powerful parallel searches across multiple indexers. Below are additional advanced areas often overlooked in initial study but critical for both production deployments and exam success.

1. Search Head Authentication with Search Peers

When configuring distributed search, a Search Head (SH) must be able to authenticate and securely communicate with one or more Indexers (Search Peers).

Key Concepts:

Communication between SH and Indexers happens over management port 8089 (HTTPS).
Authentication is required during peer registration, particularly in SSL-enabled environments.

Certificate Trust for SSL:

If SSL is enabled on the Indexer, the Search Head must trust the Indexer’s certificate.
You can do this by:
- Importing the Indexer’s certificate into the SH’s trust store.
- Or (not recommended), disable certificate validation via configuration.

CLI Command for Adding a Search Peer:

splunk add search-server https://<indexer_ip>:8089 -auth admin:changeme

This command allows the SH to register the Indexer as a search peer using the provided credentials.

Best Practice:

Always use signed certificates and configure SSL validation properly for secure environments.

2. Search Head Cluster Deployer Limitations

The Deployer in a Search Head Cluster (SHC) plays a crucial role in managing configurations across all SHC members but has strict limitations.

Role of Deployer:

Distributes configuration bundles (e.g., apps, lookups, dashboards) to all SHC members.
Uses splunk apply shcluster-bundle to push updates.

Important Limitations:

The Deployer is not part of the cluster.
It does not run searches, index data, or serve end users.
It is used only for configuration distribution.

Best Practice:

Keep the Deployer separate from production SHCs.
Always test bundles before deployment to avoid cluster instability.

3. Licensing and Distributed Search

Licensing must be managed cohesively across all components of a distributed Splunk deployment.

Key Rules:

All Search Heads and Indexers must be covered under the same license (license pool).
Typically, a centralized License Master is used to manage licenses across multiple sites or tiers.

Cross-Site Deployment:

For deployments that span multiple data centers or regions:
- Use License Pooling to segment usage.
- Use Forwarder-based data routing to ensure license usage is traceable per site.

Best Practice:

Ensure proper license monitoring through the Monitoring Console or splunk list licenser-pools.

4. Forwarder Acknowledgment in Clustering Environments

While not directly part of distributed search, Forwarder acknowledgment (ACK) is highly relevant in Indexer Clustering where data integrity is crucial.

What It Does:

When enabled, the Universal Forwarder (UF) waits for acknowledgment from the receiving Indexer before sending the next data block.
Ensures data durability by confirming that events have been successfully indexed.

Configuration:

[general]
ack = true

Impact in Clustering:

Forwarder waits for Primary Bucket to receive and replicate data successfully (as per replication factor).
Helps avoid data loss in high-availability environments.

5. Multisite Indexer Clustering (Advanced)

Multisite clustering is used to provide geographic redundancy and disaster recovery capabilities across data centers or cloud regions.

Purpose:

Ensures that copies of data exist in multiple locations.
Maintains availability in the event of a site failure.

Configuration in `server.conf`:

[clustering]
multisite = true
site = site1
available_sites = site1,site2
site_replication_factor = origin:2,total:3
site_search_factor = origin:1,total:2

Parameter Explanation:

origin: the site where data is initially indexed.
total: total number of replicated copies across all sites.
Example: Store 2 copies in origin site, 1 in remote site.

Use Case:

Financial or healthcare institutions that must survive site outages without losing data or search capability.

Best Practices Recap

Area	Recommendation
Search Head to Indexer Auth	Use secure certificates; avoid disabling SSL verification.
Deployer	Use only for config distribution, never as a production search node.
Licensing	Ensure all nodes are managed under a shared License Master.
Forwarder ACK	Enable `ack = true` for reliable delivery, especially in clustered environments.
Multisite Clustering	Use for geo-redundancy; configure replication/search factors properly.

Shopping cart

Subtotal:

SPLK-1003 Distributed Search

Detailed list of SPLK-1003 knowledge points

Distributed Search Detailed Explanation

1. Distributed Environment Components

1.1 Search Head

1.2 Indexer Cluster

1.3 Search Head Cluster

2. Configuration Steps

2.1 Connect Search Head to Indexers

Steps:

2.2 Set Up Replication in Indexer Clusters

Steps:

3. Optimization Tips

3.1 Use Search Head Pooling

3.2 Enable Parallel Processing for Complex Searches

Real-World Scenarios

Scenario 1: Scaling for High Query Load

Steps:

Scenario 2: Ensuring High Availability for Indexers

Steps:

Scenario 3: Segmenting Data Access for Security

Steps:

Hands-On Exercises

Exercise 1: Connect a Search Head to Multiple Indexers

Steps:

Exercise 2: Monitor Cluster Health

Steps:

Exercise 3: Optimize Search Head Performance

Steps:

Troubleshooting Distributed Search

Common Issues and Solutions

Issue 1: Search Head Fails to Connect to Indexers

Issue 2: Data Replication Errors in Indexer Cluster

Issue 3: Slow Searches in Distributed Environment

Best Practices

Distributed Search (Additional Content)

1. Search Head Authentication with Search Peers

Key Concepts:

Certificate Trust for SSL:

CLI Command for Adding a Search Peer:

Best Practice:

2. Search Head Cluster Deployer Limitations

Role of Deployer:

Important Limitations:

Best Practice:

3. Licensing and Distributed Search

Key Rules:

Cross-Site Deployment:

Best Practice:

4. Forwarder Acknowledgment in Clustering Environments

What It Does:

Configuration:

Impact in Clustering:

5. Multisite Indexer Clustering (Advanced)

Purpose:

Configuration in server.conf:

Parameter Explanation:

Use Case:

Best Practices Recap

Frequently Asked Questions

Configuration in `server.conf`: