Imagine you have a huge log file—maybe millions of rows of server data, user activity, or sensor readings. If you want to find patterns, summarize behaviors, or understand trends, you can’t read each row one by one. That’s where statistical commands come in.
They help you:
Count how many events occurred
Find averages, totals, or maximum values
Understand data over time
Compare one group to another (e.g., errors by server)
Think of statistical commands as tools for summarizing your data so it’s easier to understand and use.
statsstats?The stats command performs aggregations—like counting, summing, or averaging data—based on specific fields.
You use it when you want a summary of your search results, such as:
How many events per user?
What is the total traffic per server?
What is the average response time per application?
stats... | stats <function>(<field>) by <group_field>
Let’s break that down:
<function>: What kind of math do you want? Count? Average?
<field>: Which field do you want to summarize? (like bytes, cpu, etc.)
by <group_field>: How do you want to group the results? (like host, user, etc.)
index=web_logs
| stats count by host
This shows how many events came from each host.
index=web_logs
| stats avg(response_time) by uri_path
Here, you're checking which pages load fastest or slowest.
index=sales
| stats sum(amount) by product_name
amount values for each product.stats| Function | What it Does | Example Use |
|---|---|---|
count |
Counts number of events | stats count by user |
sum(x) |
Adds up values in field x |
stats sum(bytes) by host |
avg(x) |
Calculates average | stats avg(score) |
max(x) |
Finds the maximum value | stats max(price) |
min(x) |
Finds the minimum value | stats min(duration) |
dc(x) |
Distinct count (how many unique values) | stats dc(ip) |
values(x) |
List of distinct values | stats values(method) |
list(x) |
List of all values (duplicates allowed) | stats list(user) |
stats important?It's the foundation of data summarization in Splunk.
It works with dashboards, reports, alerts, and more.
You can chain multiple functions together, like this:
index=web_logs
| stats count, avg(response_time), max(response_time) by host
This gives:
total events per host
average response time
highest response time per host
Quick Tip for Beginners:
Always make sure the field you’re summarizing (e.g., bytes, duration) exists in your events. If it doesn’t, the stats result may be empty or incorrect.
eventstatseventstats?eventstats works almost exactly like stats, but there's one big difference:
stats gives you a summary (and collapses your events).
eventstats gives you the same statistics, but adds them to each original event.
So instead of reducing the number of events, it keeps all events, and simply adds new fields with the summary values.
... | eventstats <function>(<field>) as <new_field> by <group_field>
index=web_logs
| eventstats avg(response_time) as avg_response by host
Every event now gets a new field: avg_response
That field contains the average response time for the same host as that event.
Let’s say you want to find requests slower than average for each host.
index=web_logs
| eventstats avg(response_time) as avg_response by host
| where response_time > avg_response
What happens here:
eventstats adds the average per host to every event.
where filters for requests that were slower than the host's average.
This is a very common real-world use case!
eventstats vs. stats?| Scenario | Use |
|---|---|
| You need a summary only (no original events) | stats |
| You want to compare each event to a group average/total | eventstats |
streamstatsstreamstats?streamstats calculates cumulative or sequential statistics — one event at a time, in order. This is useful when order matters (like time, or sequence).
Unlike stats, it doesn’t wait until the end of the search to calculate results.
... | streamstats <function>(<field>) as <new_field> by <group_field>
index=web_logs
| streamstats sum(bytes) as running_total
Adds a field running_total to each event.
For each new event, it adds bytes to the total from the previous events.
index=auth_logs
| streamstats count by user
| where count = 1
This returns the first event for each user.
streamstats count by user increments for every new event per user.
index=system_logs
| streamstats current=f window=1 last(_time) as prev_time
| eval time_diff = _time - prev_time
Calculates the time gap between current and previous event
current=f means we skip the current _time and only look at the previous one
streamstats vs. eventstats vs. stats| Feature | stats |
eventstats |
streamstats |
|---|---|---|---|
| Keeps original events? | No (just summary) | Yes (adds stats) | Yes (adds stats in order) |
| Works sequentially? | No | No | Yes |
| Good for comparison? | No | Yes | Yes (for time/sequence) |
Beginner Tips:
Use streamstats when event order matters, such as:
Session tracking
Running totals
Event-to-event comparisons
Don’t forget to sort your events if you need order (| sort _time), especially before using streamstats.
timecharttimechart?timechart is designed for time-based summaries — when you want to see how values change over time.
Unlike stats, which groups by field values, timechart groups by time.
... | timechart <function>(<field>) by <group_field>
<function>: e.g., count, avg(bytes), sum(sales)
by <group_field>: (Optional) compare across categories
index=web_logs
| timechart count
index=system_logs
| timechart avg(cpu) by host
timechart automatically splits time into buckets, e.g.:
last 24 hours → hourly buckets
last 30 minutes → 1-minute buckets
You can override this:
| timechart span=15m count
Trend charts (errors over time, sales by hour, login count by day)
Line/bar charts in dashboards
Comparing groups (hosts, users, apps) over time
chartchart?chart is like stats, but made for 2D summaries, especially useful when creating tables and visual comparisons.
You can think of it as a pivot table:
Rows → one field
Columns → another field
Cells → summary value
... | chart <function>(<field>) over <row_field> by <column_field>
index=sales
| chart sum(amount) over product by region
Output:
| product | US | EU | Asia |
|---|---|---|---|
| Phone | 5000 | 7000 | 6000 |
| Laptop | 3000 | 4000 | 2000 |
Heatmaps
Pivot-style dashboards
2-dimensional reports (e.g., count of errors by app and server)
Now let’s look at some special statistical functions that work with stats, chart, timechart, etc.
dc(field) – Distinct CountCounts how many unique values exist in a field.
index=web_logs
| stats dc(user_id)
→ How many unique users?
values(field) – List of unique valuesGives a list of distinct values seen in a field (in any order).
index=web_logs
| stats values(status)
→ Might return: 200, 404, 500
list(field) – List (with duplicates)Shows all values, including duplicates.
index=web_logs
| stats list(user)
→ Could return: alice, bob, alice, charlie
stdev(field) – Standard DeviationMeasures how spread out values are.
index=performance
| stats stdev(cpu)
→ A high value = unstable CPU; low value = consistent usage
median(field) – Middle valueReturns the middle number from all values.
index=scores
| stats median(score)
Why not average? Because median ignores outliers.
top and rareIn addition to the core aggregation command stats, two other shortcut statistical commands commonly appear in both real-world usage and the SPLK-1004 exam: top and rare.
top CommandThe top command shows the most frequent values for a given field, by default displaying the top 10 values, along with their count and percentage.
Syntax:
index=web_logs | top uri_path
What it does:
It automatically calculates count and percent for each unique value of uri_path.
It sorts the results in descending order of count.
It essentially wraps a stats count by <field> followed by sort - count.
Output structure:
| uri_path | count | percent |
|---|---|---|
| /home | 500 | 25.0 |
| /products | 400 | 20.0 |
| /contact | 300 | 15.0 |
| ... | ... | ... |
You can also add options like limit or showcount=false to customize the output.
rare CommandThe rare command does the opposite of top. It shows the least frequent values of a given field.
Syntax:
index=auth_logs | rare user
What it does:
Returns the field values that occur least often in the dataset.
Useful for identifying outliers, anomalies, or rare user activity.
Typical output:
| user | count |
|---|---|
| guest | 1 |
| root_backup | 1 |
| temp_user23 | 2 |
timechart with spanThe timechart command supports a span option that controls the granularity of time buckets in the chart.
Smaller span values (like 1m, 5m, 15m) create more detailed time series charts.
Larger span values (like 1h, 1d) create coarser visualizations.
Example:
index=web_logs
| timechart span=15m avg(response_time)
This divides the timeline into 15-minute intervals and calculates the average response time for each bucket.
Smaller spans provide finer detail, but they can increase search time and memory usage, especially over long time ranges or large datasets.
Always balance performance and detail level when using span.
sparkline() Function for Mini Trendlinessparkline() is a specialized function available inside stats and timechart that embeds a miniature trendline (also called a sparkline) within a result cell.
Great for dashboards or reports where you want to visually compare trends across rows without using separate charts.
Often used alongside aggregations to show temporal trends per category.
Example:
index=web_logs
| timechart span=1h avg(response_time) by host
becomes more visual with:
index=web_logs
| stats sparkline(avg(response_time)) as trend by host
Result:
| host | trend |
|---|---|
| server1 | ▇▆▅▇▇▆▆▇ |
| server2 | ▂▃▃▃▅▆▇▇ |
Each sparkline represents how avg(response_time) changed over time per host.
Works best when the time granularity is consistent.
Often used in tables to add context to numerical data.
Does not affect the numerical aggregation; it's purely visual.
When should I use eventstats instead of stats if I need group totals but still want the original events preserved?
Use eventstats when you need an aggregate added back onto each event without collapsing the event stream.
stats transforms the pipeline into one row per grouping combination, so all original event-level context disappears unless you rebuild it later. eventstats computes the aggregate by group and writes the result into each matching event, which is much better when later commands still need raw fields, row order, or additional filtering. A common mistake is reaching for stats too early, then discovering you need fields that were dropped. Another common issue is using appendpipe to rebuild totals that eventstats could have supplied more directly. In exam-style reasoning, choose eventstats when the requirement says “calculate a summary per group but continue working with individual events.”
Demand Score: 62
Exam Relevance Score: 88
What problem does streamstats solve better than stats or eventstats?
streamstats is best for running calculations that depend on event order, such as rolling counts, prior values, and sequential comparisons.
Unlike stats, which summarizes a complete result set, and eventstats, which adds full-group aggregates, streamstats evaluates records as they flow through the pipeline. That makes it ideal for rank-like counters, session progress, previous-value comparisons, and cumulative totals. The key requirement is stable ordering, usually after sort or naturally time-ordered data. Users often get wrong answers because they expect streamstats to behave like a grouped aggregate when it is really a streaming calculation. If the requirement mentions “previous event,” “running total,” or “incremental ranking,” streamstats is usually the better fit. If the requirement is simply “sum by host,” then stats or eventstats is more appropriate.
Demand Score: 64
Exam Relevance Score: 90
Why is appendpipe useful in subtotal-style reporting even when a stats command already exists in the search?
appendpipe lets you run an additional reporting branch on the current results to add subtotal or total rows without rerunning the base search.
This is especially useful when you already have grouped results and want to append summary rows such as totals by code, host, or category. It preserves the current pipeline state, so the appended branch works from the transformed results rather than from raw events. The common mistake is confusing appendpipe with append; append runs a subsearch, while appendpipe works on the current result set. In practical dashboard work, appendpipe is often chosen for subtotal rows, quick rollups, or alternative summary views. On the exam, prefer it when the wording suggests “take the current table and add summary rows.”
Demand Score: 60
Exam Relevance Score: 84