Shopping cart

Subtotal:

$0.00

SPLK-3002 Troubleshooting ITSI

Troubleshooting ITSI

Detailed list of SPLK-3002 knowledge points

Troubleshooting ITSI Detailed Explanation

1. Purpose of Troubleshooting ITSI

The goal of troubleshooting in ITSI is to identify and fix issues that impact the platform’s performance or accuracy, such as:

  • Misconfigured thresholds

  • Slow or failing searches

  • Missing or incomplete data

  • Errors in dashboards or visualizations

By regularly checking for these issues, you ensure your service health scores and alerts are accurate and trustworthy.

2. Common Troubleshooting Areas

a. Search Performance

ITSI runs many scheduled searches to calculate KPIs. If searches are slow or failing, KPIs won’t update correctly.

Tips:
  • Use Search Inspector to analyze individual searches for:

    • Execution time

    • Search phase delays (e.g., dispatch, reduce)

  • Check scheduler logs for:

    • Skipped searches

    • Concurrency issues (too many searches running at once)

b. Missing Data

If a KPI shows “No Result” or isn’t updating, the underlying data might be missing or misrouted.

Tips:
  • Check index settings to make sure data is going to the correct location.

  • Review your base searches for typos or incorrect filters.

  • Use ITSI Data Audit dashboards to:

    • Spot stale data

    • Identify gaps in search results

    • Monitor data latency or inconsistencies

c. Threshold Misconfigurations

Improper thresholds can result in false alerts or missed incidents.

Tips:
  • Review KPI thresholds to ensure they are:

    • Not too sensitive (causing too many alerts)

    • Not too loose (missing real issues)

  • Verify that Time Policies are applied correctly (e.g., different rules during business hours vs. off-hours)

d. Notable Event Issues

Sometimes, Notable Events are not being created, grouped, or escalated as expected.

Tips:
  • Review Aggregation Policy logs for errors or misapplied conditions.

  • Make sure Correlation Searches are:

    • Enabled

    • Returning the expected results

    • Properly tagged and categorized

e. Glass Table or Deep Dive Errors

If visualizations aren’t working:

  • KPIs may not be correctly linked to their data source

  • There could be permission issues or invalid tokens used in the Glass Table

Tips:
  • Check if each KPI used in a visualization has:

    • Valid search results

    • Correct entity bindings

  • Review access controls for users who are unable to view tables or interact with Deep Dives

3. Tools for Troubleshooting

ITSI provides several tools and utilities to help you diagnose and resolve issues:

a. itsi_troubleshooting_toolkit (Add-On)

  • Optional app that provides:

    • Troubleshooting dashboards

    • Health check reports

    • Environment diagnostics

Great for larger deployments or complex issues.

b. _internal Logs

  • Use Splunk’s internal logs to:

    • Track error messages

    • Find search timeouts or skipped searches

    • Identify system-level warnings

Example search:

index=_internal sourcetype=scheduler OR sourcetype=itsi*

c. KPI Search Logs

  • Each KPI search can log:

    • Errors in SPL

    • Long runtimes

    • No-result conditions

Check these logs when a KPI is not displaying any data.

4. Best Practices for Troubleshooting ITSI

Monitor Search Concurrency and Load

  • Avoid scheduling too many KPI searches at the same time.

  • Spread out search schedules using cron expressions to balance system load.

Document and Version Control Services

  • Keep track of changes to KPIs, thresholds, and services.

  • Use version control tools or export/import methods to back up configurations.

Work with Splunk Support for Complex Issues

  • For problems that persist or are hard to diagnose, engage Splunk Support.

  • Provide them with:

    • Logs

    • Configuration snapshots

    • Environment details

This speeds up resolution and avoids guesswork.

Summary: What to Remember About Troubleshooting ITSI

  • Troubleshooting ITSI is about ensuring data accuracy, search performance, and alert reliability.

  • Focus on search health, data flow, thresholds, event policies, and dashboards.

  • Use built-in tools like Search Inspector, Audit Dashboards, and internal logs.

  • Follow best practices to keep your ITSI environment stable, efficient, and trustworthy.

Troubleshooting ITSI (Additional Content)

1. Identifying KPI Anomalies: Stale, Invalid, or No-Result States

In ITSI, a KPI may appear abnormal even if no visible error occurs. Three common problematic states are:

  • Stale: The KPI is not receiving updated data within its expected schedule.

    • Common causes: Skipped searches, index delays, search concurrency overload.
  • Invalid: The base search returns non-numeric or incorrectly formatted data.

    • Common causes: SPL syntax issues, wrong eval logic, or missing fields.
  • No Result: The search runs but returns zero matching events.

    • Common causes: Field typos (e.g., hostnme instead of hostname), time range misalignment, or insufficient permissions.

Tip: Use Search Inspector, and temporarily modify the base SPL with | head 10 or | stats count to validate live data retrieval.

Being able to recognize and differentiate these conditions is essential to determine whether the issue lies in the search, the data, or ITSI configuration.

2. Debugging Complex Correlation Searches

When a correlation search fails to trigger expected Notable Events—despite known conditions being met—the following steps are recommended:

a. Narrow the Query Scope

Use lightweight SPL for testing, such as:

| tstats count where index=itsi_summary by host, _time

Or leverage:

| datamodel itsi_summary.kpi search 

This helps confirm whether the expected data is available before running the full logic.

b. Enable Debug Logging

Increase log verbosity for itsi or the correlation search scheduler. This can reveal:

  • Syntax parsing issues

  • Skipped searches due to role-based permissions

  • Aggregation timeouts or misaligned tokens

Always test correlation searches in isolation with known conditions and minimal filters before scaling to full production logic.

3. Using itsi_summary and itsi_notable_archive for Troubleshooting

These two indexes are vital for verifying ITSI output flows:

  • itsi_summary:

    • Stores raw KPI results (aggregated or per-entity)

    • Useful for checking if KPI values exist, are timely, and match thresholds

    • Example SPL:

      index=itsi_summary kpi="CPU Usage" | timechart avg(kpi_value) by host
      
  • itsi_notable_archive:

    • Stores archived Notable Events for review, audit, or forensic analysis

    • Helps determine if a correlation search ever fired an event

    • Example SPL:

      index=itsi_notable_archive rule_title="High CPU and Memory" | stats count by service_name, severity
      

Together, these indexes offer a full trace of detection logic → alert generation → event archival, making them essential for root cause analysis in ITSI environments.

Summary

  • Understand special KPI states: Learn how to identify and resolve stale, invalid, and no result conditions.

  • Use smart debugging techniques for correlation searches: narrow queries, validate datasets, and increase logging.

  • Leverage internal indexes: Query itsi_summary and itsi_notable_archive to cross-check KPI values and Notable Event history.

  • Apply validation best practices: Use lightweight SPL commands like | stats count, and always run base searches interactively before embedding them in KPIs or alerts.

Frequently Asked Questions

What is the purpose of maintenance mode in ITSI?

Answer:

To temporarily suppress alerts and service health changes during planned maintenance.

Explanation:

Maintenance mode allows administrators to prevent monitoring alerts from triggering during scheduled system maintenance or upgrades. When maintenance mode is enabled for a service, KPI evaluations and alert generation are temporarily suppressed. This prevents false incidents from being created while systems are intentionally offline or undergoing changes. Maintenance mode therefore helps maintain accurate incident records and reduces unnecessary alert noise during operational maintenance activities.

Demand Score: 84

Exam Relevance Score: 90

What is a common reason a KPI search returns no results?

Answer:

The search query does not match any indexed data.

Explanation:

KPI searches rely on Splunk Search Processing Language (SPL) queries to retrieve operational metrics from indexed data. If the search query is incorrectly written, references the wrong index, or uses filters that exclude relevant events, the KPI evaluation may return no results. When this occurs, the KPI may display missing values or remain in an unknown state. Administrators typically troubleshoot this issue by running the search manually in the Splunk search interface to verify that the query returns expected results.

Demand Score: 90

Exam Relevance Score: 89

Where are ITSI service and KPI configurations primarily stored?

Answer:

In KV Store collections.

Explanation:

ITSI stores most configuration objects—including services, KPIs, entities, and dependency definitions—within KV Store collections rather than standard Splunk indexes. Because these configurations are stored in KV Store, administrators must ensure that KV Store is functioning properly and included in backup procedures. When restoring an ITSI deployment, KV Store data must also be restored to recover service configurations. Understanding where configuration data resides is therefore essential for troubleshooting and disaster recovery planning.

Demand Score: 82

Exam Relevance Score: 91

Why might KPI severity not update even though the KPI search returns results?

Answer:

Because threshold evaluation or KPI scheduling is misconfigured.

Explanation:

Even when KPI searches successfully retrieve data, severity states may not update if thresholds are missing or evaluation intervals are misconfigured. KPI severity depends on comparing search results against defined thresholds. If thresholds are not configured correctly or the KPI evaluation schedule is disabled, the system cannot determine severity states. Administrators should verify threshold settings, evaluation schedules, and KPI status to ensure that KPI results translate into correct severity values.

Demand Score: 86

Exam Relevance Score: 88

What is an essential step when restoring an ITSI deployment after a failure?

Answer:

Restoring KV Store data and ITSI configuration indexes.

Explanation:

When recovering an ITSI environment after system failure, administrators must restore both the KV Store collections and relevant Splunk indexes containing ITSI operational data. KV Store holds configuration objects such as services, KPIs, and dependencies, while indexes store notable events and KPI evaluation results. If KV Store is not restored, service definitions and monitoring configurations may be lost even if indexed data remains intact. Therefore, disaster recovery procedures must include both configuration and operational data restoration.

Demand Score: 80

Exam Relevance Score: 87

SPLK-3002 Training Course