Exploring Statistical Commands

Exploring Statistical Commands Detailed Explanation

1. What are Statistical Commands in Splunk?

Imagine you have a huge log file—maybe millions of rows of server data, user activity, or sensor readings. If you want to find patterns, summarize behaviors, or understand trends, you can’t read each row one by one. That’s where statistical commands come in.

They help you:

Count how many events occurred
Find averages, totals, or maximum values
Understand data over time
Compare one group to another (e.g., errors by server)

Think of statistical commands as tools for summarizing your data so it’s easier to understand and use.

2. Core Statistical Command: `stats`

What is `stats`?

The stats command performs aggregations—like counting, summing, or averaging data—based on specific fields.

You use it when you want a summary of your search results, such as:

How many events per user?
What is the total traffic per server?
What is the average response time per application?

Syntax of `stats`

... | stats <function>(<field>) by <group_field>

Let’s break that down:

<function>: What kind of math do you want? Count? Average?
<field>: Which field do you want to summarize? (like bytes, cpu, etc.)
by <group_field>: How do you want to group the results? (like host, user, etc.)

Example 1: Count events per host

index=web_logs
| stats count by host

This shows how many events came from each host.

Example 2: Average response time per URL

index=web_logs
| stats avg(response_time) by uri_path

Here, you're checking which pages load fastest or slowest.

Example 3: Total sales per product

index=sales
| stats sum(amount) by product_name

This adds up all `amount` values for each product.

Supported Functions in `stats`

Function	What it Does	Example Use
`count`	Counts number of events	`stats count by user`
`sum(x)`	Adds up values in field `x`	`stats sum(bytes) by host`
`avg(x)`	Calculates average	`stats avg(score)`
`max(x)`	Finds the maximum value	`stats max(price)`
`min(x)`	Finds the minimum value	`stats min(duration)`
`dc(x)`	Distinct count (how many unique values)	`stats dc(ip)`
`values(x)`	List of distinct values	`stats values(method)`
`list(x)`	List of all values (duplicates allowed)	`stats list(user)`

Why is `stats` important?

It's the foundation of data summarization in Splunk.
It works with dashboards, reports, alerts, and more.
You can chain multiple functions together, like this:

index=web_logs
| stats count, avg(response_time), max(response_time) by host

This gives:

total events per host
average response time
highest response time per host

Quick Tip for Beginners: Always make sure the field you’re summarizing (e.g., bytes, duration) exists in your events. If it doesn’t, the stats result may be empty or incorrect.

3. Exploring `eventstats`

What is `eventstats`?

eventstats works almost exactly like stats, but there's one big difference:

stats gives you a summary (and collapses your events).
eventstats gives you the same statistics, but adds them to each original event.

So instead of reducing the number of events, it keeps all events, and simply adds new fields with the summary values.

Basic Syntax

... | eventstats <function>(<field>) as <new_field> by <group_field>

Example: Add average response time per host

index=web_logs
| eventstats avg(response_time) as avg_response by host

Every event now gets a new field: avg_response
That field contains the average response time for the same host as that event.

Use Case: Compare event to average

Let’s say you want to find requests slower than average for each host.

index=web_logs
| eventstats avg(response_time) as avg_response by host
| where response_time > avg_response

What happens here:

eventstats adds the average per host to every event.
where filters for requests that were slower than the host's average.

This is a very common real-world use case!

When to use `eventstats` vs. `stats`?

Scenario	Use
You need a summary only (no original events)	`stats`
You want to compare each event to a group average/total	`eventstats`

4. Exploring `streamstats`

What is `streamstats`?

streamstats calculates cumulative or sequential statistics — one event at a time, in order. This is useful when order matters (like time, or sequence).

Unlike stats, it doesn’t wait until the end of the search to calculate results.

Basic Syntax

... | streamstats <function>(<field>) as <new_field> by <group_field>

Example 1: Running total of bytes

index=web_logs
| streamstats sum(bytes) as running_total

Adds a field running_total to each event.
For each new event, it adds bytes to the total from the previous events.

Example 2: Find first occurrence per user

index=auth_logs
| streamstats count by user
| where count = 1

This returns the first event for each user.
streamstats count by user increments for every new event per user.

Example 3: Time between events

index=system_logs
| streamstats current=f window=1 last(_time) as prev_time
| eval time_diff = _time - prev_time

Calculates the time gap between current and previous event
current=f means we skip the current _time and only look at the previous one

`streamstats` vs. `eventstats` vs. `stats`

Feature	`stats`	`eventstats`	`streamstats`
Keeps original events?	No (just summary)	Yes (adds stats)	Yes (adds stats in order)
Works sequentially?	No	No	Yes
Good for comparison?	No	Yes	Yes (for time/sequence)

Beginner Tips:

Use streamstats when event order matters, such as:
- Session tracking
- Running totals
- Event-to-event comparisons
Don’t forget to sort your events if you need order (| sort _time), especially before using streamstats.

5. Exploring `timechart`

What is `timechart`?

timechart is designed for time-based summaries — when you want to see how values change over time.

Unlike stats, which groups by field values, timechart groups by time.

Basic Syntax

... | timechart <function>(<field>) by <group_field>

<function>: e.g., count, avg(bytes), sum(sales)
by <group_field>: (Optional) compare across categories

Example 1: Count of events over time

index=web_logs
| timechart count

Shows how many events occurred per time bucket (default = 1 minute/hour/day depending on time range)

Example 2: Average CPU usage per host

index=system_logs
| timechart avg(cpu) by host

Displays a line for each host showing how CPU changed over time.

Time Bucketing

timechart automatically splits time into buckets, e.g.:

last 24 hours → hourly buckets
last 30 minutes → 1-minute buckets

You can override this:

| timechart span=15m count

Best Use Cases

Trend charts (errors over time, sales by hour, login count by day)
Line/bar charts in dashboards
Comparing groups (hosts, users, apps) over time

6. Exploring `chart`

What is `chart`?

chart is like stats, but made for 2D summaries, especially useful when creating tables and visual comparisons.

You can think of it as a pivot table:

Rows → one field
Columns → another field
Cells → summary value

Basic Syntax

... | chart <function>(<field>) over <row_field> by <column_field>

Example: Total sales per product per region

index=sales
| chart sum(amount) over product by region

Output:

product	US	EU	Asia
Phone	5000	7000	6000
Laptop	3000	4000	2000

Use Cases

Heatmaps
Pivot-style dashboards
2-dimensional reports (e.g., count of errors by app and server)

7. Common Aggregation Functions

Now let’s look at some special statistical functions that work with stats, chart, timechart, etc.

1. `dc(field)` – Distinct Count

Counts how many unique values exist in a field.

index=web_logs
| stats dc(user_id)

→ How many unique users?

2. `values(field)` – List of unique values

Gives a list of distinct values seen in a field (in any order).

index=web_logs
| stats values(status)

→ Might return: 200, 404, 500

3. `list(field)` – List (with duplicates)

Shows all values, including duplicates.

index=web_logs
| stats list(user)

→ Could return: alice, bob, alice, charlie

4. `stdev(field)` – Standard Deviation

Measures how spread out values are.

index=performance
| stats stdev(cpu)

→ A high value = unstable CPU; low value = consistent usage

5. `median(field)` – Middle value

Returns the middle number from all values.

index=scores
| stats median(score)

Why not average? Because median ignores outliers.

Exploring Statistical Commands (Additional Content)

1. Additional Commands: `top` and `rare`

In addition to the core aggregation command stats, two other shortcut statistical commands commonly appear in both real-world usage and the SPLK-1004 exam: top and rare.

a) `top` Command

The top command shows the most frequent values for a given field, by default displaying the top 10 values, along with their count and percentage.

Syntax:

index=web_logs | top uri_path

What it does:

It automatically calculates count and percent for each unique value of uri_path.
It sorts the results in descending order of count.
It essentially wraps a stats count by <field> followed by sort - count.

Output structure:

uri_path	count	percent
/home	500	25.0
/products	400	20.0
/contact	300	15.0
...	...	...

You can also add options like limit or showcount=false to customize the output.

b) `rare` Command

The rare command does the opposite of top. It shows the least frequent values of a given field.

Syntax:

index=auth_logs | rare user

What it does:

Returns the field values that occur least often in the dataset.
Useful for identifying outliers, anomalies, or rare user activity.

Typical output:

user	count
guest	1
root_backup	1
temp_user23	2

2. Time Granularity in `timechart` with `span`

The timechart command supports a span option that controls the granularity of time buckets in the chart.

Key Insight:

Smaller span values (like 1m, 5m, 15m) create more detailed time series charts.
Larger span values (like 1h, 1d) create coarser visualizations.

Example:

index=web_logs
| timechart span=15m avg(response_time)

This divides the timeline into 15-minute intervals and calculates the average response time for each bucket.

Important Note:

Smaller spans provide finer detail, but they can increase search time and memory usage, especially over long time ranges or large datasets.

Always balance performance and detail level when using span.

3. `sparkline()` Function for Mini Trendlines

sparkline() is a specialized function available inside stats and timechart that embeds a miniature trendline (also called a sparkline) within a result cell.

Use Case:

Great for dashboards or reports where you want to visually compare trends across rows without using separate charts.
Often used alongside aggregations to show temporal trends per category.

Example:

index=web_logs
| timechart span=1h avg(response_time) by host

becomes more visual with:

index=web_logs
| stats sparkline(avg(response_time)) as trend by host

Result:

host	trend
server1	▇▆▅▇▇▆▆▇
server2	▂▃▃▃▅▆▇▇

Each sparkline represents how avg(response_time) changed over time per host.

Notes:

Works best when the time granularity is consistent.
Often used in tables to add context to numerical data.
Does not affect the numerical aggregation; it's purely visual.

Shopping cart

Subtotal:

SPLK-1004 Exploring Statistical Commands

Detailed list of SPLK-1004 knowledge points

Exploring Statistical Commands Detailed Explanation

1. What are Statistical Commands in Splunk?

2. Core Statistical Command: stats

What is stats?

Syntax of stats

Example 1: Count events per host

Example 2: Average response time per URL

Example 3: Total sales per product

This adds up all amount values for each product.

Supported Functions in stats

Why is stats important?

3. Exploring eventstats

What is eventstats?

Basic Syntax

Example: Add average response time per host

Use Case: Compare event to average

When to use eventstats vs. stats?

4. Exploring streamstats

What is streamstats?

Basic Syntax

Example 1: Running total of bytes

Example 2: Find first occurrence per user

Example 3: Time between events

streamstats vs. eventstats vs. stats

5. Exploring timechart

What is timechart?

Basic Syntax

Example 1: Count of events over time

Example 2: Average CPU usage per host

Time Bucketing

Best Use Cases

6. Exploring chart

What is chart?

Basic Syntax

Example: Total sales per product per region

Use Cases

7. Common Aggregation Functions

1. dc(field) – Distinct Count

2. values(field) – List of unique values

3. list(field) – List (with duplicates)

4. stdev(field) – Standard Deviation

5. median(field) – Middle value

Exploring Statistical Commands (Additional Content)

1. Additional Commands: top and rare

a) top Command

b) rare Command

2. Time Granularity in timechart with span

Key Insight:

Important Note:

3. sparkline() Function for Mini Trendlines

Use Case:

Notes:

Frequently Asked Questions

2. Core Statistical Command: `stats`

What is `stats`?

Syntax of `stats`

This adds up all `amount` values for each product.

Supported Functions in `stats`

Why is `stats` important?

3. Exploring `eventstats`

What is `eventstats`?

When to use `eventstats` vs. `stats`?

4. Exploring `streamstats`

What is `streamstats`?

`streamstats` vs. `eventstats` vs. `stats`

5. Exploring `timechart`

What is `timechart`?

6. Exploring `chart`

What is `chart`?

1. `dc(field)` – Distinct Count

2. `values(field)` – List of unique values

3. `list(field)` – List (with duplicates)

4. `stdev(field)` – Standard Deviation

5. `median(field)` – Middle value

1. Additional Commands: `top` and `rare`

a) `top` Command

b) `rare` Command

2. Time Granularity in `timechart` with `span`

3. `sparkline()` Function for Mini Trendlines