Using Advanced Transactions

Using Advanced Transactions Detailed Explanation

1. `transaction` Command Overview

The transaction command in Splunk is used to group together multiple related events into a single logical unit, known as a “transaction.”

This is particularly useful when:

Events are scattered across time
You need to track activity over a session or workflow
There are no clear field-based groupings that can be used with stats

2. Syntax

A typical transaction command groups events using a common field (like session_id or user) and defines the start and end of the transaction.

Example Syntax:

... | transaction session_id startswith="login" endswith="logout"

session_id: The field that links related events
startswith="login": Marks the first event in the transaction
endswith="logout": Marks the last event in the transaction

This will group all events with the same session_id starting from the first "login" and ending at the next "logout".

3. Key Options

When working with the transaction command, there are several key parameters you can use to control how events are grouped:

a) `maxspan`

Specifies the maximum total duration of a transaction. Events that extend beyond this time window will be excluded or split.

Example:

transaction user maxspan=30m

→ All events for the same user that occur within a 30-minute span will be grouped.

b) `maxpause`

Specifies the maximum allowed time gap between any two consecutive events in the same transaction.

Example:

transaction session_id maxpause=5m

→ If any two events for the same session are more than 5 minutes apart, a new transaction will begin.

c) `keepevicted=true`

By default, if a transaction cannot be completed (e.g., missing the end event), it's discarded.

Setting keepevicted=true allows these incomplete transactions to still be shown in the results.

Use case: You want to see dropped sessions or users who never logged out.

4. When to Use `transaction`

The transaction command is especially helpful in scenarios where:

You don’t have a clear numeric or time-based identifier
Events must be grouped by sequence and timing, not just field values

Typical Use Cases:

Scenario	Description
Login/Logout Tracking	Group user actions from login to logout
Shopping Cart Activity	Group events from cart creation to checkout
Intrusion Detection	Bundle a sequence of suspicious events
Workflow Monitoring	Trace multiple steps (e.g., form submission to approval)

5. Alternatives to `transaction`

While transaction is very powerful, it can be resource-intensive, especially with large datasets.

In many cases, you can achieve similar results using stats (which is more efficient).

When to use `stats` instead:

You have a unique field (like session_id) that can group events
You can use time-based or sequence fields (e.g., earliest, latest)

Example using stats:

... | stats earliest(_time) as session_start, latest(_time) as session_end by session_id
| eval duration = session_end - session_start

This gives you similar information to a transaction — session timing — without grouping raw events together.

Summary Table: Transaction vs Stats

Feature	`transaction`	`stats`
Groups raw events?	Yes	No (returns summary per group)
Performance	Slower on large data	Faster and more scalable
Use case flexibility	Works with timing and sequencing	Works well with clear identifiers
Output type	Single event with all fields concatenated	Single row per group with calculated fields

Best Practices

Use transaction only when necessary (e.g., event sequencing is essential).
When possible, prefer stats for performance.
Limit the scope (time range, filtered fields) before using transaction.
Monitor search performance using Search Job Inspector to detect slowdowns.

Using Advanced Transactions (Additional Content)

1. Understanding Performance Cost of `transaction`

The transaction command is powerful but expensive in terms of system resources. Its performance degrades significantly when:

There are large volumes of events
The search includes wide time ranges
You use complex startswith and endswith conditions
There are long gaps between related events (forcing more memory retention)

Performance Profile (Described Textually)

Imagine a chart where:

The x-axis is the number of events
The y-axis is the search duration (in seconds)

For a basic stats command, the line remains relatively flat or linear.

For transaction, the line curves upward exponentially as event count increases—highlighting how quickly performance can degrade.

Key Takeaway:
Use transaction only when necessary, and always limit your search with time and field filters.

2. Comparison: `transaction` vs `stats` with Output Examples

a) Using `transaction`

index=web_logs | transaction session_id startswith="login" endswith="logout"

Result:

session_id	duration	eventcount	_raw
abc123	45s	3	(combined raw of login, activity, logout)

→ Events are merged into a single row, preserving full raw details.

b) Using `stats`

index=web_logs | stats earliest(_time) as start latest(_time) as end by session_id

Result:

session_id	start	end
abc123	2024-04-20 10:01:00	2024-04-20 10:01:45

→ Only timestamps are summarized, no raw event data is retained.

Key Differences:

Feature	`transaction`	`stats`
Combines raw events	Yes	No
Lightweight	No	Yes
Performance	Slow for large data	Fast
Best use case	Event sequence tracking	Time span measurement

3. Supplementary Technique: Using `streamstats` for Session-like Grouping

Sometimes, transaction is overkill, and you can simulate session grouping using streamstats.

Example:

index=web_logs
| sort 0 _time
| streamstats reset_after="duration > 1800" count as session_id by user

This creates a session_id that resets every time the gap between events exceeds 30 minutes.
You can then apply stats to calculate per-session metrics.

Use case:
User activity that has no explicit session field, but where inactivity gaps define session breaks.

Best Practices Recap for Advanced Transaction Handling

Practice	Reason
Prefer `stats` or `streamstats` if you don't need full raw events	Saves memory and speeds up search
Limit by `index`, `sourcetype`, `host`, and time range before `transaction`	Reduce event volume early
Avoid using `transaction` with subsearches or across large datasets	It magnifies performance cost
Use `keepevicted=true` to review incomplete sessions (optional)	Helps detect anomalies like dropped connections

Shopping cart

Subtotal:

SPLK-1004 Using Advanced Transactions

Detailed list of SPLK-1004 knowledge points

Using Advanced Transactions Detailed Explanation

1. `transaction` Command Overview

2. Syntax

3. Key Options

a) `maxspan`

b) `maxpause`

c) `keepevicted=true`

4. When to Use `transaction`

Typical Use Cases:

5. Alternatives to `transaction`

When to use `stats` instead:

Summary Table: Transaction vs Stats

Best Practices

Using Advanced Transactions (Additional Content)

1. Understanding Performance Cost of `transaction`

Performance Profile (Described Textually)

2. Comparison: `transaction` vs `stats` with Output Examples

a) Using `transaction`

b) Using `stats`

3. Supplementary Technique: Using `streamstats` for Session-like Grouping

Example:

Best Practices Recap for Advanced Transaction Handling

Frequently Asked Questions

Product Center

Exam Categories

Support & Community

Shopping cart

Subtotal:

SPLK-1004 Using Advanced Transactions

Using Advanced Transactions

Detailed list of SPLK-1004 knowledge points

Using Advanced Transactions Detailed Explanation

1. transaction Command Overview

2. Syntax

3. Key Options

a) maxspan

b) maxpause

c) keepevicted=true

4. When to Use transaction

Typical Use Cases:

5. Alternatives to transaction

When to use stats instead:

Summary Table: Transaction vs Stats

Best Practices

Using Advanced Transactions (Additional Content)

1. Understanding Performance Cost of transaction

Performance Profile (Described Textually)

2. Comparison: transaction vs stats with Output Examples

a) Using transaction

b) Using stats

3. Supplementary Technique: Using streamstats for Session-like Grouping

Example:

Best Practices Recap for Advanced Transaction Handling

Frequently Asked Questions

1. `transaction` Command Overview

a) `maxspan`

b) `maxpause`

c) `keepevicted=true`

4. When to Use `transaction`

5. Alternatives to `transaction`

When to use `stats` instead:

1. Understanding Performance Cost of `transaction`

2. Comparison: `transaction` vs `stats` with Output Examples

a) Using `transaction`

b) Using `stats`

3. Supplementary Technique: Using `streamstats` for Session-like Grouping