Shopping cart

Subtotal:

$0.00

SPLK-1003 Distributed Search

Distributed Search

Detailed list of SPLK-1003 knowledge points

Distributed Search Detailed Explanation

Distributed search in Splunk allows large-scale deployments by separating the roles of searching and indexing across multiple components. This guide explains the core components, configuration steps, and optimization tips for setting up and managing a distributed Splunk environment.

1. Distributed Environment Components

A distributed environment comprises three primary components, each with specific roles in the search and indexing process.

1.1 Search Head

  • Overview:

    • The Search Head is the user-facing component where users create and execute search queries, build dashboards, and manage alerts.
    • It does not store data but acts as a query manager, forwarding search requests to Indexers and combining results.
  • Responsibilities:

    • Distribute SPL (Search Processing Language) queries to Indexers.
    • Aggregate results and present them to the user.
    • Manage dashboards, reports, and knowledge objects.
  • Use Case:

    • A company wants multiple users to run queries concurrently without overloading a single server.

1.2 Indexer Cluster

  • Overview:

    • The Indexer Cluster stores and processes raw data, making it searchable. It is the backbone of distributed search.
  • Key Features:

    • Data Storage: Indexers organize data into buckets (hot, warm, cold, frozen).
    • High Availability: Replicates data across multiple Indexers for fault tolerance.
    • Data Processing: Tokenizes, parses, and stores data during ingestion.
  • Replication Factor (RF):

    • The number of copies of data maintained in the cluster.
    • Example: RF=2 ensures two copies of each piece of data are stored.
  • Search Factor (SF):

    • The number of searchable copies maintained in the cluster.
    • Example: SF=2 ensures two Indexers have fully searchable copies.

1.3 Search Head Cluster

  • Overview:

    • A Search Head Cluster provides redundancy and load balancing by pooling multiple Search Heads.
    • Essential for environments with high query loads or critical uptime requirements.
  • Key Features:

    • Failover: If one Search Head fails, others take over automatically.
    • Load Balancing: Distributes user queries across multiple Search Heads.
    • Synchronization: Shares knowledge objects (e.g., dashboards, alerts) between Search Heads.

2. Configuration Steps

2.1 Connect Search Head to Indexers

To establish communication between the Search Head and Indexers, use the distsearch.conf file.

Steps:
  1. Locate distsearch.conf:

    • File path: $SPLUNK_HOME/etc/system/local/distsearch.conf.
  2. Define Indexers:

    • Example configuration:

      [distributedSearch]
      servers = <indexer1_ip>:8089, <indexer2_ip>:8089
      
    • Replace <indexer1_ip> and <indexer2_ip> with the actual IPs of the Indexers.

  3. Restart Splunk:

    • Apply the changes:

      ./splunk restart
      
  4. Verify Search Peers:

    • Go to Settings > Distributed Search > Search Peers in Splunk Web.
    • Ensure all Indexers show as "Connected."

2.2 Set Up Replication in Indexer Clusters

Replication ensures high availability and data redundancy in Indexer Clusters.

Steps:
  1. Edit server.conf on Indexers:

    • Example configuration:

      [clustering]
      mode = slave
      master_uri = https://<cluster_master_ip>:8089
      replication_factor = 2
      search_factor = 2
      
  2. Configure the Cluster Master:

    • Add this to the Cluster Master’s server.conf:

      [clustering]
      mode = master
      replication_factor = 2
      search_factor = 2
      
    • Restart the Cluster Master:

      ./splunk restart
      
  3. Add Indexers to the Cluster:

    • Ensure Indexers are connected to the Cluster Master:

      ./splunk add cluster-master https://<cluster_master_ip>:8089
      
  4. Verify Cluster Status:

    • On the Cluster Master, check the status:

      ./splunk show cluster-status
      

3. Optimization Tips

3.1 Use Search Head Pooling

Search Head pooling allows multiple Search Heads to share user-generated objects like dashboards and alerts, reducing resource duplication.

  • Steps:
    1. Configure shared storage accessible by all Search Heads.
    2. Edit server.conf on each Search Head to point to the shared storage.

3.2 Enable Parallel Processing for Complex Searches

  • Purpose:

    • Improves performance by breaking down large searches into smaller tasks processed by multiple Indexers.
  • Steps:

    1. Configure limits.conf:

      • Example:

        [search]
        max_searches_per_cpu = 2
        
    2. Restart Splunk for changes to take effect:

      ./splunk restar
      

Real-World Scenarios

Scenario 1: Scaling for High Query Load

A company experiences increased search demands, leading to slow query responses. They decide to scale the Search Head capacity using a Search Head Cluster.

Steps:
  1. Set Up a Search Head Cluster:

    • Install Splunk on multiple servers to act as Search Heads.
  2. Configure server.conf for the Cluster:

    • On each Search Head, add:

      [shclustering]
      mode = member
      mgmt_uri = https://<search_head_ip>:8089
      replication_port = 8080
      pass4SymmKey = <shared_key>
      
  3. Designate a Deployer:

    • On the Deployer, configure server.conf:

      [shclustering]
      mode = deployer
      pass4SymmKey = <shared_key>
      
  4. Push Configuration from the Deployer:

    • Deploy updated configurations to Search Heads:

      ./splunk apply shcluster-bundle -target https://<search_head_ip>:8089
      
  5. Verify Cluster Status:

    • On any Search Head:

      ./splunk show shcluster-status
      

Scenario 2: Ensuring High Availability for Indexers

An organization wants to ensure data is always available, even if one Indexer fails.

Steps:
  1. Deploy an Indexer Cluster:

    • Configure multiple Indexers and a Cluster Master.
  2. Set Replication and Search Factors:

    • Ensure replication and search factors are sufficient:
      • Replication Factor (RF): 2
      • Search Factor (SF): 2
  3. Monitor Cluster Health:

    • Use the Cluster Master to check cluster status:

      ./splunk show cluster-status
      
  4. Simulate Failover:

    • Stop one Indexer and verify that data is still accessible from the cluster.

Scenario 3: Segmenting Data Access for Security

A company needs to restrict access to specific indexes for different departments while maintaining centralized search capability.

Steps:
  1. Define Roles and Index Access:

    • Create department-specific roles in Splunk Web:
      • IT → Access to it_logs.
      • Finance → Access to finance_logs.
  2. Map Search Heads to Indexers:

    • Edit distsearch.conf to connect Search Heads to relevant Indexers.
  3. Apply Search Filters:

    • Define search filters in roles.conf:

      [role_it]
      srchFilter = index=it_logs
      
      [role_finance]
      srchFilter = index=finance_logs
      
  4. Test Permissions:

    • Log in as a user with each role and ensure they only see permitted data.

Hands-On Exercises

Exercise 1: Connect a Search Head to Multiple Indexers

Goal: Configure a Search Head to query data from two Indexers.

Steps:
  1. Edit distsearch.conf:

    [distributedSearch]
    servers = 192.168.1.10:8089, 192.168.1.11:8089
    
  2. Restart the Search Head:

    ./splunk restart
    
  3. Verify Search Peer Connectivity:

    • Go to Settings > Distributed Search > Search Peers and ensure both Indexers show as "Connected."
  4. Run a Query:

    • Execute a search that spans both Indexers:

      index=* | stats count by index
      

Exercise 2: Monitor Cluster Health

Goal: Use the Cluster Master to monitor an Indexer Cluster’s status.

Steps:
  1. Connect to the Cluster Master:

    • Use the CLI on the Cluster Master to check status:

      ./splunk show cluster-status
      
  2. Check Replication:

    • Verify that all buckets have sufficient replication:

      ./splunk show cluster-status | grep -i replication
      
  3. Inspect Peer Status:

    • Ensure all Indexers are listed as active.

Exercise 3: Optimize Search Head Performance

Goal: Enable parallel processing for large queries.

Steps:
  1. Edit limits.conf:

    [search]
    max_searches_per_cpu = 2
    
  2. Restart the Search Head:

    ./splunk restart
    
  3. Run a Complex Query:

    • Test performance with a large dataset:

      index=main | stats count by host
      
  4. Monitor Resource Usage:

    • Use the Monitoring Console to track query performance.

Troubleshooting Distributed Search

Common Issues and Solutions

Issue 1: Search Head Fails to Connect to Indexers
  • Cause:

    • Incorrect settings in distsearch.conf.
  • Solution:

    1. Verify the distsearch.conf configuration:

      splunk cmd btool distsearch list --debug
      
    2. Ensure the Indexers are reachable on port 8089.

Issue 2: Data Replication Errors in Indexer Cluster
  • Cause:

    • Insufficient disk space or network issues.
  • Solution:

    1. Check Indexer logs for errors:

      index=_internal source=*splunkd.log cluster
      
    2. Ensure enough disk space is available on all Indexers.

Issue 3: Slow Searches in Distributed Environment
  • Cause:

    • Overloaded Indexers or suboptimal search queries.
  • Solution:

    1. Optimize SPL queries by avoiding wildcards:

      index=main sourcetype=syslog | stats count by host
      
    2. Enable parallel search processing in limits.conf.

Best Practices

  1. Use Load Balancing for Indexers:

    • Distribute data ingestion across multiple Indexers to avoid bottlenecks.
  2. Monitor Search Performance Regularly:

    • Use the Monitoring Console to track query response times.
  3. Plan Replication Factors:

    • Set RF and SF values based on data criticality and resource availability.
  4. Test Failover Scenarios:

    • Periodically simulate Indexer or Search Head failures to ensure high availability.

Distributed Search (Additional Content)

Distributed Search enables Splunk to scale horizontally by separating search functions from indexing, facilitating powerful parallel searches across multiple indexers. Below are additional advanced areas often overlooked in initial study but critical for both production deployments and exam success.

1. Search Head Authentication with Search Peers

When configuring distributed search, a Search Head (SH) must be able to authenticate and securely communicate with one or more Indexers (Search Peers).

Key Concepts:

  • Communication between SH and Indexers happens over management port 8089 (HTTPS).

  • Authentication is required during peer registration, particularly in SSL-enabled environments.

Certificate Trust for SSL:

  • If SSL is enabled on the Indexer, the Search Head must trust the Indexer’s certificate.

  • You can do this by:

    • Importing the Indexer’s certificate into the SH’s trust store.

    • Or (not recommended), disable certificate validation via configuration.

CLI Command for Adding a Search Peer:

splunk add search-server https://<indexer_ip>:8089 -auth admin:changeme

This command allows the SH to register the Indexer as a search peer using the provided credentials.

Best Practice:

  • Always use signed certificates and configure SSL validation properly for secure environments.

2. Search Head Cluster Deployer Limitations

The Deployer in a Search Head Cluster (SHC) plays a crucial role in managing configurations across all SHC members but has strict limitations.

Role of Deployer:

  • Distributes configuration bundles (e.g., apps, lookups, dashboards) to all SHC members.

  • Uses splunk apply shcluster-bundle to push updates.

Important Limitations:

  • The Deployer is not part of the cluster.

  • It does not run searches, index data, or serve end users.

  • It is used only for configuration distribution.

Best Practice:

  • Keep the Deployer separate from production SHCs.

  • Always test bundles before deployment to avoid cluster instability.

3. Licensing and Distributed Search

Licensing must be managed cohesively across all components of a distributed Splunk deployment.

Key Rules:

  • All Search Heads and Indexers must be covered under the same license (license pool).

  • Typically, a centralized License Master is used to manage licenses across multiple sites or tiers.

Cross-Site Deployment:

  • For deployments that span multiple data centers or regions:

    • Use License Pooling to segment usage.

    • Use Forwarder-based data routing to ensure license usage is traceable per site.

Best Practice:

  • Ensure proper license monitoring through the Monitoring Console or splunk list licenser-pools.

4. Forwarder Acknowledgment in Clustering Environments

While not directly part of distributed search, Forwarder acknowledgment (ACK) is highly relevant in Indexer Clustering where data integrity is crucial.

What It Does:

  • When enabled, the Universal Forwarder (UF) waits for acknowledgment from the receiving Indexer before sending the next data block.

  • Ensures data durability by confirming that events have been successfully indexed.

Configuration:

[general]
ack = true

Impact in Clustering:

  • Forwarder waits for Primary Bucket to receive and replicate data successfully (as per replication factor).

  • Helps avoid data loss in high-availability environments.

5. Multisite Indexer Clustering (Advanced)

Multisite clustering is used to provide geographic redundancy and disaster recovery capabilities across data centers or cloud regions.

Purpose:

  • Ensures that copies of data exist in multiple locations.

  • Maintains availability in the event of a site failure.

Configuration in server.conf:

[clustering]
multisite = true
site = site1
available_sites = site1,site2
site_replication_factor = origin:2,total:3
site_search_factor = origin:1,total:2

Parameter Explanation:

  • origin: the site where data is initially indexed.

  • total: total number of replicated copies across all sites.

  • Example: Store 2 copies in origin site, 1 in remote site.

Use Case:

  • Financial or healthcare institutions that must survive site outages without losing data or search capability.

Best Practices Recap

Area Recommendation
Search Head to Indexer Auth Use secure certificates; avoid disabling SSL verification.
Deployer Use only for config distribution, never as a production search node.
Licensing Ensure all nodes are managed under a shared License Master.
Forwarder ACK Enable ack = true for reliable delivery, especially in clustered environments.
Multisite Clustering Use for geo-redundancy; configure replication/search factors properly.

Frequently Asked Questions

In a Splunk distributed search architecture, which component dispatches search queries to other instances that contain indexed data?

Answer:

The search head dispatches search queries to search peers (typically indexers).

Explanation:

In distributed search, the search head acts as the coordinator for user queries. When a user runs a search, the search head distributes the search request to configured search peers, which usually contain indexed data. These peers execute the search locally on their data and return partial results to the search head. The search head then merges, sorts, and presents the combined results to the user interface. A common misunderstanding is assuming that forwarders or indexers initiate searches. In reality, forwarders only send data to indexers, while indexers store data and respond to queries issued by the search head. This architecture allows searches to scale across multiple indexers while keeping query management centralized.

Demand Score: 90

Exam Relevance Score: 94

What is the primary role of a search peer in a Splunk distributed search environment?

Answer:

A search peer responds to search requests from the search head by executing searches on its indexed data.

Explanation:

A search peer is typically an indexer that stores indexed data and participates in distributed searches. When the search head sends a query, the search peer runs the search locally against its stored events. It then returns the results back to the search head for aggregation. Although indexers commonly serve as search peers, other Splunk instances with searchable data (such as monitoring consoles) can also function as search peers in some contexts. A common mistake is confusing search peers with forwarders. Forwarders only send raw data to indexers and do not store searchable indexes, so they cannot respond to distributed search queries.

Demand Score: 86

Exam Relevance Score: 92

Which configuration file can be edited to manually define search peers in a Splunk distributed search deployment?

Answer:

distsearch.conf

Explanation:

The distsearch.conf configuration file defines distributed search settings on the search head. Administrators can manually add search peers by specifying them in the servers setting within the [distributedSearch] stanza. Each peer is listed as a URI including protocol, hostname, and management port. After modifying the configuration file, the search head must be restarted for the changes to take effect. When peers are added manually through configuration files instead of Splunk Web or CLI commands, administrators must also distribute authentication key files (such as trusted.pem) to the peers to establish trust. Failing to distribute these keys often leads to authentication failures when the search head attempts to communicate with peers.

Demand Score: 88

Exam Relevance Score: 90

What configuration setting inside distsearch.conf defines the list of search peers?

Answer:

The servers setting under the [distributedSearch] stanza.

Explanation:

Within the distsearch.conf file, the [distributedSearch] stanza controls how the search head interacts with search peers. The servers parameter specifies a comma-separated list of search peers using their management endpoints. Each entry must include the URI scheme (http or https), hostname or IP address, and the management port (usually 8089). When this configuration is applied, the search head distributes search requests to the listed peers. Administrators must ensure the peer instances are reachable and properly authenticated. Incorrect URIs, missing schemes, or wrong ports can prevent the distributed search connection from functioning correctly.

Demand Score: 82

Exam Relevance Score: 89

What prerequisite must be satisfied before an indexer can function as a search peer?

Answer:

The indexer must have its default password changed so the search head can authenticate.

Explanation:

Before a search head can connect to an indexer as a search peer, the indexer must have valid credentials configured. Splunk does not allow authentication using default credentials for distributed search connections. This requirement ensures that administrative access between the search head and peer is secured. When configuring peers through Splunk Web or the CLI, the administrator must provide both local authentication credentials and remote credentials for the peer instance. If this prerequisite is not met, the search head will fail to establish the distributed search connection even if the peer address is correctly configured.

Demand Score: 75

Exam Relevance Score: 87

Why must the trusted.pem file sometimes be manually copied to search peers when configuring distributed search through configuration files?

Answer:

Because manual configuration does not automatically distribute authentication keys between the search head and peers.

Explanation:

When search peers are added using Splunk Web or the CLI, Splunk automatically handles authentication by distributing required keys. However, when administrators configure distributed search manually by editing distsearch.conf, this automation does not occur. In such cases, the search head’s public key (trusted.pem) must be manually copied to the appropriate directory on each search peer. This key enables the peer to trust and authenticate search requests from the search head. Without the correct key distribution, authentication failures occur even though the peer appears correctly configured in the distributed search settings.

Demand Score: 74

Exam Relevance Score: 88

SPLK-1003 Training Course