Shopping cart

Subtotal:

$0.00

SPLK-1004 Exploring Statistical Commands

Exploring Statistical Commands

Detailed list of SPLK-1004 knowledge points

Exploring Statistical Commands Detailed Explanation

1. What are Statistical Commands in Splunk?

Imagine you have a huge log file—maybe millions of rows of server data, user activity, or sensor readings. If you want to find patterns, summarize behaviors, or understand trends, you can’t read each row one by one. That’s where statistical commands come in.

They help you:

  • Count how many events occurred

  • Find averages, totals, or maximum values

  • Understand data over time

  • Compare one group to another (e.g., errors by server)

Think of statistical commands as tools for summarizing your data so it’s easier to understand and use.

2. Core Statistical Command: stats

What is stats?

The stats command performs aggregations—like counting, summing, or averaging data—based on specific fields.

You use it when you want a summary of your search results, such as:

  • How many events per user?

  • What is the total traffic per server?

  • What is the average response time per application?

Syntax of stats

... | stats <function>(<field>) by <group_field>

Let’s break that down:

  • <function>: What kind of math do you want? Count? Average?

  • <field>: Which field do you want to summarize? (like bytes, cpu, etc.)

  • by <group_field>: How do you want to group the results? (like host, user, etc.)

Example 1: Count events per host

index=web_logs
| stats count by host

This shows how many events came from each host.

Example 2: Average response time per URL

index=web_logs
| stats avg(response_time) by uri_path

Here, you're checking which pages load fastest or slowest.

Example 3: Total sales per product

index=sales
| stats sum(amount) by product_name

This adds up all amount values for each product.

Supported Functions in stats

Function What it Does Example Use
count Counts number of events stats count by user
sum(x) Adds up values in field x stats sum(bytes) by host
avg(x) Calculates average stats avg(score)
max(x) Finds the maximum value stats max(price)
min(x) Finds the minimum value stats min(duration)
dc(x) Distinct count (how many unique values) stats dc(ip)
values(x) List of distinct values stats values(method)
list(x) List of all values (duplicates allowed) stats list(user)

Why is stats important?

  • It's the foundation of data summarization in Splunk.

  • It works with dashboards, reports, alerts, and more.

  • You can chain multiple functions together, like this:

index=web_logs
| stats count, avg(response_time), max(response_time) by host

This gives:

  • total events per host

  • average response time

  • highest response time per host

Quick Tip for Beginners: Always make sure the field you’re summarizing (e.g., bytes, duration) exists in your events. If it doesn’t, the stats result may be empty or incorrect.

3. Exploring eventstats

What is eventstats?

eventstats works almost exactly like stats, but there's one big difference:

  • stats gives you a summary (and collapses your events).

  • eventstats gives you the same statistics, but adds them to each original event.

So instead of reducing the number of events, it keeps all events, and simply adds new fields with the summary values.

Basic Syntax

... | eventstats <function>(<field>) as <new_field> by <group_field>

Example: Add average response time per host

index=web_logs
| eventstats avg(response_time) as avg_response by host
  • Every event now gets a new field: avg_response

  • That field contains the average response time for the same host as that event.

Use Case: Compare event to average

Let’s say you want to find requests slower than average for each host.

index=web_logs
| eventstats avg(response_time) as avg_response by host
| where response_time > avg_response

What happens here:

  • eventstats adds the average per host to every event.

  • where filters for requests that were slower than the host's average.

This is a very common real-world use case!

When to use eventstats vs. stats?

Scenario Use
You need a summary only (no original events) stats
You want to compare each event to a group average/total eventstats

4. Exploring streamstats

What is streamstats?

streamstats calculates cumulative or sequential statistics — one event at a time, in order. This is useful when order matters (like time, or sequence).

Unlike stats, it doesn’t wait until the end of the search to calculate results.

Basic Syntax

... | streamstats <function>(<field>) as <new_field> by <group_field>

Example 1: Running total of bytes

index=web_logs
| streamstats sum(bytes) as running_total
  • Adds a field running_total to each event.

  • For each new event, it adds bytes to the total from the previous events.

Example 2: Find first occurrence per user

index=auth_logs
| streamstats count by user
| where count = 1
  • This returns the first event for each user.

  • streamstats count by user increments for every new event per user.

Example 3: Time between events

index=system_logs
| streamstats current=f window=1 last(_time) as prev_time
| eval time_diff = _time - prev_time
  • Calculates the time gap between current and previous event

  • current=f means we skip the current _time and only look at the previous one

streamstats vs. eventstats vs. stats

Feature stats eventstats streamstats
Keeps original events? No (just summary) Yes (adds stats) Yes (adds stats in order)
Works sequentially? No No Yes
Good for comparison? No Yes Yes (for time/sequence)

Beginner Tips:

  • Use streamstats when event order matters, such as:

    • Session tracking

    • Running totals

    • Event-to-event comparisons

  • Don’t forget to sort your events if you need order (| sort _time), especially before using streamstats.

5. Exploring timechart

What is timechart?

timechart is designed for time-based summaries — when you want to see how values change over time.

Unlike stats, which groups by field values, timechart groups by time.

Basic Syntax

... | timechart <function>(<field>) by <group_field>
  • <function>: e.g., count, avg(bytes), sum(sales)

  • by <group_field>: (Optional) compare across categories

Example 1: Count of events over time

index=web_logs
| timechart count
  • Shows how many events occurred per time bucket (default = 1 minute/hour/day depending on time range)

Example 2: Average CPU usage per host

index=system_logs
| timechart avg(cpu) by host
  • Displays a line for each host showing how CPU changed over time.

Time Bucketing

timechart automatically splits time into buckets, e.g.:

  • last 24 hours → hourly buckets

  • last 30 minutes → 1-minute buckets

You can override this:

| timechart span=15m count

Best Use Cases

  • Trend charts (errors over time, sales by hour, login count by day)

  • Line/bar charts in dashboards

  • Comparing groups (hosts, users, apps) over time

6. Exploring chart

What is chart?

chart is like stats, but made for 2D summaries, especially useful when creating tables and visual comparisons.

You can think of it as a pivot table:

  • Rows → one field

  • Columns → another field

  • Cells → summary value

Basic Syntax

... | chart <function>(<field>) over <row_field> by <column_field>

Example: Total sales per product per region

index=sales
| chart sum(amount) over product by region

Output:

product US EU Asia
Phone 5000 7000 6000
Laptop 3000 4000 2000

Use Cases

  • Heatmaps

  • Pivot-style dashboards

  • 2-dimensional reports (e.g., count of errors by app and server)

7. Common Aggregation Functions

Now let’s look at some special statistical functions that work with stats, chart, timechart, etc.

1. dc(field) – Distinct Count

Counts how many unique values exist in a field.

index=web_logs
| stats dc(user_id)

→ How many unique users?

2. values(field) – List of unique values

Gives a list of distinct values seen in a field (in any order).

index=web_logs
| stats values(status)

→ Might return: 200, 404, 500

3. list(field) – List (with duplicates)

Shows all values, including duplicates.

index=web_logs
| stats list(user)

→ Could return: alice, bob, alice, charlie

4. stdev(field) – Standard Deviation

Measures how spread out values are.

index=performance
| stats stdev(cpu)

→ A high value = unstable CPU; low value = consistent usage

5. median(field) – Middle value

Returns the middle number from all values.

index=scores
| stats median(score)

Why not average? Because median ignores outliers.

Exploring Statistical Commands (Additional Content)

1. Additional Commands: top and rare

In addition to the core aggregation command stats, two other shortcut statistical commands commonly appear in both real-world usage and the SPLK-1004 exam: top and rare.

a) top Command

The top command shows the most frequent values for a given field, by default displaying the top 10 values, along with their count and percentage.

Syntax:

index=web_logs | top uri_path

What it does:

  • It automatically calculates count and percent for each unique value of uri_path.

  • It sorts the results in descending order of count.

  • It essentially wraps a stats count by <field> followed by sort - count.

Output structure:

uri_path count percent
/home 500 25.0
/products 400 20.0
/contact 300 15.0
... ... ...

You can also add options like limit or showcount=false to customize the output.

b) rare Command

The rare command does the opposite of top. It shows the least frequent values of a given field.

Syntax:

index=auth_logs | rare user

What it does:

  • Returns the field values that occur least often in the dataset.

  • Useful for identifying outliers, anomalies, or rare user activity.

Typical output:

user count
guest 1
root_backup 1
temp_user23 2

2. Time Granularity in timechart with span

The timechart command supports a span option that controls the granularity of time buckets in the chart.

Key Insight:

  • Smaller span values (like 1m, 5m, 15m) create more detailed time series charts.

  • Larger span values (like 1h, 1d) create coarser visualizations.

Example:

index=web_logs
| timechart span=15m avg(response_time)

This divides the timeline into 15-minute intervals and calculates the average response time for each bucket.

Important Note:

Smaller spans provide finer detail, but they can increase search time and memory usage, especially over long time ranges or large datasets.

Always balance performance and detail level when using span.

3. sparkline() Function for Mini Trendlines

sparkline() is a specialized function available inside stats and timechart that embeds a miniature trendline (also called a sparkline) within a result cell.

Use Case:

  • Great for dashboards or reports where you want to visually compare trends across rows without using separate charts.

  • Often used alongside aggregations to show temporal trends per category.

Example:

index=web_logs
| timechart span=1h avg(response_time) by host

becomes more visual with:

index=web_logs
| stats sparkline(avg(response_time)) as trend by host

Result:

host trend
server1 ▇▆▅▇▇▆▆▇
server2 ▂▃▃▃▅▆▇▇

Each sparkline represents how avg(response_time) changed over time per host.

Notes:

  • Works best when the time granularity is consistent.

  • Often used in tables to add context to numerical data.

  • Does not affect the numerical aggregation; it's purely visual.

Frequently Asked Questions

When should I use eventstats instead of stats if I need group totals but still want the original events preserved?

Answer:

Use eventstats when you need an aggregate added back onto each event without collapsing the event stream.

Explanation:

stats transforms the pipeline into one row per grouping combination, so all original event-level context disappears unless you rebuild it later. eventstats computes the aggregate by group and writes the result into each matching event, which is much better when later commands still need raw fields, row order, or additional filtering. A common mistake is reaching for stats too early, then discovering you need fields that were dropped. Another common issue is using appendpipe to rebuild totals that eventstats could have supplied more directly. In exam-style reasoning, choose eventstats when the requirement says “calculate a summary per group but continue working with individual events.”

Demand Score: 62

Exam Relevance Score: 88

What problem does streamstats solve better than stats or eventstats?

Answer:

streamstats is best for running calculations that depend on event order, such as rolling counts, prior values, and sequential comparisons.

Explanation:

Unlike stats, which summarizes a complete result set, and eventstats, which adds full-group aggregates, streamstats evaluates records as they flow through the pipeline. That makes it ideal for rank-like counters, session progress, previous-value comparisons, and cumulative totals. The key requirement is stable ordering, usually after sort or naturally time-ordered data. Users often get wrong answers because they expect streamstats to behave like a grouped aggregate when it is really a streaming calculation. If the requirement mentions “previous event,” “running total,” or “incremental ranking,” streamstats is usually the better fit. If the requirement is simply “sum by host,” then stats or eventstats is more appropriate.

Demand Score: 64

Exam Relevance Score: 90

Why is appendpipe useful in subtotal-style reporting even when a stats command already exists in the search?

Answer:

appendpipe lets you run an additional reporting branch on the current results to add subtotal or total rows without rerunning the base search.

Explanation:

This is especially useful when you already have grouped results and want to append summary rows such as totals by code, host, or category. It preserves the current pipeline state, so the appended branch works from the transformed results rather than from raw events. The common mistake is confusing appendpipe with append; append runs a subsearch, while appendpipe works on the current result set. In practical dashboard work, appendpipe is often chosen for subtotal rows, quick rollups, or alternative summary views. On the exam, prefer it when the wording suggests “take the current table and add summary rows.”

Demand Score: 60

Exam Relevance Score: 84

SPLK-1004 Training Course