transaction Command OverviewThe transaction command in Splunk is used to group together multiple related events into a single logical unit, known as a “transaction.”
This is particularly useful when:
Events are scattered across time
You need to track activity over a session or workflow
There are no clear field-based groupings that can be used with stats
A typical transaction command groups events using a common field (like session_id or user) and defines the start and end of the transaction.
Example Syntax:
... | transaction session_id startswith="login" endswith="logout"
session_id: The field that links related events
startswith="login": Marks the first event in the transaction
endswith="logout": Marks the last event in the transaction
This will group all events with the same session_id starting from the first "login" and ending at the next "logout".
When working with the transaction command, there are several key parameters you can use to control how events are grouped:
maxspanSpecifies the maximum total duration of a transaction. Events that extend beyond this time window will be excluded or split.
Example:
transaction user maxspan=30m
→ All events for the same user that occur within a 30-minute span will be grouped.
maxpauseSpecifies the maximum allowed time gap between any two consecutive events in the same transaction.
Example:
transaction session_id maxpause=5m
→ If any two events for the same session are more than 5 minutes apart, a new transaction will begin.
keepevicted=trueBy default, if a transaction cannot be completed (e.g., missing the end event), it's discarded.
Setting keepevicted=true allows these incomplete transactions to still be shown in the results.
Use case: You want to see dropped sessions or users who never logged out.
transactionThe transaction command is especially helpful in scenarios where:
You don’t have a clear numeric or time-based identifier
Events must be grouped by sequence and timing, not just field values
| Scenario | Description |
|---|---|
| Login/Logout Tracking | Group user actions from login to logout |
| Shopping Cart Activity | Group events from cart creation to checkout |
| Intrusion Detection | Bundle a sequence of suspicious events |
| Workflow Monitoring | Trace multiple steps (e.g., form submission to approval) |
transactionWhile transaction is very powerful, it can be resource-intensive, especially with large datasets.
In many cases, you can achieve similar results using stats (which is more efficient).
stats instead:You have a unique field (like session_id) that can group events
You can use time-based or sequence fields (e.g., earliest, latest)
Example using stats:
... | stats earliest(_time) as session_start, latest(_time) as session_end by session_id
| eval duration = session_end - session_start
This gives you similar information to a transaction — session timing — without grouping raw events together.
| Feature | transaction |
stats |
|---|---|---|
| Groups raw events? | Yes | No (returns summary per group) |
| Performance | Slower on large data | Faster and more scalable |
| Use case flexibility | Works with timing and sequencing | Works well with clear identifiers |
| Output type | Single event with all fields concatenated | Single row per group with calculated fields |
Use transaction only when necessary (e.g., event sequencing is essential).
When possible, prefer stats for performance.
Limit the scope (time range, filtered fields) before using transaction.
Monitor search performance using Search Job Inspector to detect slowdowns.
transactionThe transaction command is powerful but expensive in terms of system resources. Its performance degrades significantly when:
There are large volumes of events
The search includes wide time ranges
You use complex startswith and endswith conditions
There are long gaps between related events (forcing more memory retention)
Imagine a chart where:
The x-axis is the number of events
The y-axis is the search duration (in seconds)
For a basic stats command, the line remains relatively flat or linear.
For transaction, the line curves upward exponentially as event count increases—highlighting how quickly performance can degrade.
Key Takeaway:
Use transaction only when necessary, and always limit your search with time and field filters.
transaction vs stats with Output Examplestransactionindex=web_logs | transaction session_id startswith="login" endswith="logout"
Result:
| session_id | duration | eventcount | _raw |
|---|---|---|---|
| abc123 | 45s | 3 | (combined raw of login, activity, logout) |
→ Events are merged into a single row, preserving full raw details.
statsindex=web_logs | stats earliest(_time) as start latest(_time) as end by session_id
Result:
| session_id | start | end |
|---|---|---|
| abc123 | 2024-04-20 10:01:00 | 2024-04-20 10:01:45 |
→ Only timestamps are summarized, no raw event data is retained.
Key Differences:
| Feature | transaction |
stats |
|---|---|---|
| Combines raw events | Yes | No |
| Lightweight | No | Yes |
| Performance | Slow for large data | Fast |
| Best use case | Event sequence tracking | Time span measurement |
streamstats for Session-like GroupingSometimes, transaction is overkill, and you can simulate session grouping using streamstats.
index=web_logs
| sort 0 _time
| streamstats reset_after="duration > 1800" count as session_id by user
This creates a session_id that resets every time the gap between events exceeds 30 minutes.
You can then apply stats to calculate per-session metrics.
Use case:
User activity that has no explicit session field, but where inactivity gaps define session breaks.
| Practice | Reason |
|---|---|
Prefer stats or streamstats if you don't need full raw events |
Saves memory and speeds up search |
Limit by index, sourcetype, host, and time range before transaction |
Reduce event volume early |
Avoid using transaction with subsearches or across large datasets |
It magnifies performance cost |
Use keepevicted=true to review incomplete sessions (optional) |
Helps detect anomalies like dropped connections |
When should transaction be used instead of a stats-based correlation approach?
Use transaction when the requirement is explicitly to group related events into transaction objects with start/end semantics and event lists.
transaction is powerful but expensive. It is appropriate when order, duration, closure, and grouped event membership are core requirements. If you only need aggregate counts or simple correlation across fields, stats-based logic is often more efficient. The exam commonly tests this tradeoff. If the prompt emphasizes session-style grouping and complete/incomplete transaction identification, transaction is the more natural answer. If it emphasizes performance or simple aggregations, stats is usually preferred.
Demand Score: 44
Exam Relevance Score: 91
Why is identifying complete versus incomplete transactions valuable?
Because it distinguishes fully observed workflows from sessions missing expected start or end events.
This helps analysts detect failures, interruptions, or partial data capture. The exam objective focuses on understanding transaction state rather than simply bundling events together. If a scenario involves workflows that should have a beginning and an end, transaction completeness is likely a key requirement. A common mistake is using ordinary stats when the analysis needs session integrity.
Demand Score: 41
Exam Relevance Score: 87
Why is transaction often described as less efficient than alternatives?
Because it must hold and organize related events into grouped structures, which can become costly over large datasets.
That cost is why advanced users often ask whether a stats-based design could satisfy the same requirement. The exam wants you to recognize that transaction is not the default for all correlation tasks. It is best reserved for cases that truly need its grouping semantics. If performance is a concern and the requirement can be met with aggregations, a lighter approach is usually better.
Demand Score: 40
Exam Relevance Score: 89