The goal of troubleshooting in ITSI is to identify and fix issues that impact the platform’s performance or accuracy, such as:
Misconfigured thresholds
Slow or failing searches
Missing or incomplete data
Errors in dashboards or visualizations
By regularly checking for these issues, you ensure your service health scores and alerts are accurate and trustworthy.
ITSI runs many scheduled searches to calculate KPIs. If searches are slow or failing, KPIs won’t update correctly.
Use Search Inspector to analyze individual searches for:
Execution time
Search phase delays (e.g., dispatch, reduce)
Check scheduler logs for:
Skipped searches
Concurrency issues (too many searches running at once)
If a KPI shows “No Result” or isn’t updating, the underlying data might be missing or misrouted.
Check index settings to make sure data is going to the correct location.
Review your base searches for typos or incorrect filters.
Use ITSI Data Audit dashboards to:
Spot stale data
Identify gaps in search results
Monitor data latency or inconsistencies
Improper thresholds can result in false alerts or missed incidents.
Review KPI thresholds to ensure they are:
Not too sensitive (causing too many alerts)
Not too loose (missing real issues)
Verify that Time Policies are applied correctly (e.g., different rules during business hours vs. off-hours)
Sometimes, Notable Events are not being created, grouped, or escalated as expected.
Review Aggregation Policy logs for errors or misapplied conditions.
Make sure Correlation Searches are:
Enabled
Returning the expected results
Properly tagged and categorized
If visualizations aren’t working:
KPIs may not be correctly linked to their data source
There could be permission issues or invalid tokens used in the Glass Table
Check if each KPI used in a visualization has:
Valid search results
Correct entity bindings
Review access controls for users who are unable to view tables or interact with Deep Dives
ITSI provides several tools and utilities to help you diagnose and resolve issues:
itsi_troubleshooting_toolkit (Add-On)Optional app that provides:
Troubleshooting dashboards
Health check reports
Environment diagnostics
Great for larger deployments or complex issues.
_internal LogsUse Splunk’s internal logs to:
Track error messages
Find search timeouts or skipped searches
Identify system-level warnings
Example search:
index=_internal sourcetype=scheduler OR sourcetype=itsi*
Each KPI search can log:
Errors in SPL
Long runtimes
No-result conditions
Check these logs when a KPI is not displaying any data.
Avoid scheduling too many KPI searches at the same time.
Spread out search schedules using cron expressions to balance system load.
Keep track of changes to KPIs, thresholds, and services.
Use version control tools or export/import methods to back up configurations.
For problems that persist or are hard to diagnose, engage Splunk Support.
Provide them with:
Logs
Configuration snapshots
Environment details
This speeds up resolution and avoids guesswork.
Troubleshooting ITSI is about ensuring data accuracy, search performance, and alert reliability.
Focus on search health, data flow, thresholds, event policies, and dashboards.
Use built-in tools like Search Inspector, Audit Dashboards, and internal logs.
Follow best practices to keep your ITSI environment stable, efficient, and trustworthy.
In ITSI, a KPI may appear abnormal even if no visible error occurs. Three common problematic states are:
Stale: The KPI is not receiving updated data within its expected schedule.
Invalid: The base search returns non-numeric or incorrectly formatted data.
eval logic, or missing fields.No Result: The search runs but returns zero matching events.
hostnme instead of hostname), time range misalignment, or insufficient permissions.Tip: Use Search Inspector, and temporarily modify the base SPL with | head 10 or | stats count to validate live data retrieval.
Being able to recognize and differentiate these conditions is essential to determine whether the issue lies in the search, the data, or ITSI configuration.
When a correlation search fails to trigger expected Notable Events—despite known conditions being met—the following steps are recommended:
Use lightweight SPL for testing, such as:
| tstats count where index=itsi_summary by host, _time
Or leverage:
| datamodel itsi_summary.kpi search
This helps confirm whether the expected data is available before running the full logic.
Increase log verbosity for itsi or the correlation search scheduler. This can reveal:
Syntax parsing issues
Skipped searches due to role-based permissions
Aggregation timeouts or misaligned tokens
Always test correlation searches in isolation with known conditions and minimal filters before scaling to full production logic.
itsi_summary and itsi_notable_archive for TroubleshootingThese two indexes are vital for verifying ITSI output flows:
itsi_summary:
Stores raw KPI results (aggregated or per-entity)
Useful for checking if KPI values exist, are timely, and match thresholds
Example SPL:
index=itsi_summary kpi="CPU Usage" | timechart avg(kpi_value) by host
itsi_notable_archive:
Stores archived Notable Events for review, audit, or forensic analysis
Helps determine if a correlation search ever fired an event
Example SPL:
index=itsi_notable_archive rule_title="High CPU and Memory" | stats count by service_name, severity
Together, these indexes offer a full trace of detection logic → alert generation → event archival, making them essential for root cause analysis in ITSI environments.
Understand special KPI states: Learn how to identify and resolve stale, invalid, and no result conditions.
Use smart debugging techniques for correlation searches: narrow queries, validate datasets, and increase logging.
Leverage internal indexes: Query itsi_summary and itsi_notable_archive to cross-check KPI values and Notable Event history.
Apply validation best practices: Use lightweight SPL commands like | stats count, and always run base searches interactively before embedding them in KPIs or alerts.
What is the purpose of maintenance mode in ITSI?
To temporarily suppress alerts and service health changes during planned maintenance.
Maintenance mode allows administrators to prevent monitoring alerts from triggering during scheduled system maintenance or upgrades. When maintenance mode is enabled for a service, KPI evaluations and alert generation are temporarily suppressed. This prevents false incidents from being created while systems are intentionally offline or undergoing changes. Maintenance mode therefore helps maintain accurate incident records and reduces unnecessary alert noise during operational maintenance activities.
Demand Score: 84
Exam Relevance Score: 90
What is a common reason a KPI search returns no results?
The search query does not match any indexed data.
KPI searches rely on Splunk Search Processing Language (SPL) queries to retrieve operational metrics from indexed data. If the search query is incorrectly written, references the wrong index, or uses filters that exclude relevant events, the KPI evaluation may return no results. When this occurs, the KPI may display missing values or remain in an unknown state. Administrators typically troubleshoot this issue by running the search manually in the Splunk search interface to verify that the query returns expected results.
Demand Score: 90
Exam Relevance Score: 89
Where are ITSI service and KPI configurations primarily stored?
In KV Store collections.
ITSI stores most configuration objects—including services, KPIs, entities, and dependency definitions—within KV Store collections rather than standard Splunk indexes. Because these configurations are stored in KV Store, administrators must ensure that KV Store is functioning properly and included in backup procedures. When restoring an ITSI deployment, KV Store data must also be restored to recover service configurations. Understanding where configuration data resides is therefore essential for troubleshooting and disaster recovery planning.
Demand Score: 82
Exam Relevance Score: 91
Why might KPI severity not update even though the KPI search returns results?
Because threshold evaluation or KPI scheduling is misconfigured.
Even when KPI searches successfully retrieve data, severity states may not update if thresholds are missing or evaluation intervals are misconfigured. KPI severity depends on comparing search results against defined thresholds. If thresholds are not configured correctly or the KPI evaluation schedule is disabled, the system cannot determine severity states. Administrators should verify threshold settings, evaluation schedules, and KPI status to ensure that KPI results translate into correct severity values.
Demand Score: 86
Exam Relevance Score: 88
What is an essential step when restoring an ITSI deployment after a failure?
Restoring KV Store data and ITSI configuration indexes.
When recovering an ITSI environment after system failure, administrators must restore both the KV Store collections and relevant Splunk indexes containing ITSI operational data. KV Store holds configuration objects such as services, KPIs, and dependencies, while indexes store notable events and KPI evaluation results. If KV Store is not restored, service definitions and monitoring configurations may be lost even if indexed data remains intact. Therefore, disaster recovery procedures must include both configuration and operational data restoration.
Demand Score: 80
Exam Relevance Score: 87