Manipulating raw data in Splunk involves transforming, enriching, or redirecting data to improve its usability and relevance. This guide covers data transformation basics, advanced techniques, and examples of how to mask sensitive data, rename fields, enrich events, and configure event routing.
Data transformation modifies raw data as it is ingested or indexed, enabling better searchability and compliance.
props.conf:
sourcetypes, hosts, or sources.transforms.conf:
props.conf:
[sensitive_logs]
TRANSFORMS-mask_cc = mask_credit_card
transforms.conf:
[mask_credit_card]
REGEX = (\d{4}-\d{4}-\d{4}-\d{4})
FORMAT = XXXX-XXXX-XXXX-XXXX
DEST_KEY = _raw
Verification:
Search for events to confirm the data is masked:
index=sensitive sourcetype=sensitive_logs | table _raw
http_user_agent to user_agent for simplicity.props.conf:
[web_logs]
TRANSFORMS-rename_field = rename_user_agent
transforms.conf:
[rename_user_agent]
REGEX = (.*)
FORMAT = user_agent::$1
SOURCE_KEY = http_user_agent
DEST_KEY = _meta
Verification:
Confirm the field is renamed in searches:
index=web sourcetype=web_logs | table user_agent
region, based on the IP address of the event's source.props.conf:
[network_logs]
TRANSFORMS-enrich = add_region
transforms.conf:
[add_region]
REGEX = ^192\.168\.(\d+)\.
FORMAT = region::NorthAmerica
DEST_KEY = _meta
Verification:
Search for the new region field:
index=network sourcetype=network_logs | stats count by region
error_logs index.props.conf:
[application_logs]
TRANSFORMS-route_errors = route_to_error_index
transforms.conf:
[route_to_error_index]
REGEX = .*ERROR.*
DEST_KEY = _MetaData:Index
FORMAT = error_logs
Verification:
Ensure only error events are in the error_logs index:
index=error_logs sourcetype=application_logs
props.conf:
[app_logs]
TRANSFORMS-drop_debug = drop_debug_logs
transforms.conf:
[drop_debug_logs]
REGEX = .*DEBUG.*
DEST_KEY = queue
FORMAT = nullQueue
Verification:
Search to confirm debug logs are not indexed:
index=app sourcetype=app_logs NOT message=*DEBUG*
Use btool to debug parsing and transformation configurations:
splunk cmd btool props list --debug
splunk cmd btool transforms list --debug
Monitor the _internal index for transformation errors:
index=_internal sourcetype=splunkd component=TRANSFORMS
Test in Staging:
Use Modular Configurations:
Optimize Regex:
Document Transformations:
country, city, or latitude.Enable GeoIP in Splunk:
Use the iplocation command to map IP addresses to locations during searches:
index=web sourcetype=web_logs | iplocation client_ip | table client_ip Country City
For Real-Time Enrichment:
Use a lookup table or transforms:
Create a GeoIP lookup CSV file (geoip.csv) with mappings:
ip_range,country,city
192.168.0.0/24,USA,New York
10.0.0.0/8,Canada,Toronto
Configure transforms.conf:
[geoip_lookup]
filename = geoip.csv
Apply the lookup in props.conf:
[web_logs]
LOOKUP-geoip = geoip_lookup client_ip OUTPUT country city
Verification:
Query for enriched fields:
index=web sourcetype=web_logs | stats count by country city
props.conf:
[email_logs]
TRANSFORMS-mask_email = conditional_email_masking
transforms.conf:
[conditional_email_masking]
REGEX = (?<=Email:\s)(\w+@\w+\.\w+)
FORMAT = masked_email::[email protected]
CONDITION = client_ip IN (192.168.1.0/24, 10.0.0.0/8)
DEST_KEY = _raw
Verification:
Confirm email addresses are masked only for specific IP ranges:
index=email_logs sourcetype=email_logs | table _raw
props.conf and transforms.conf.Sample JSON Log:
{
"user": "alice",
"action": "purchase",
"details": {
"product": "laptop",
"price": 1200
}
}
props.conf:
[json_logs]
KV_MODE = json
REPORT-nested_fields = extract_nested_json
transforms.conf:
[extract_nested_json]
REGEX = "product":"(?P<product>[^"]+)","price":(?P<price>\d+)
FORMAT = product::$1 price::$2
Verification:
Search for extracted fields:
index=json sourcetype=json_logs | table user action product price
environment field (e.g., production, staging) to logs based on hostnames.props.conf:
[host::prod*]
TRANSFORMS-environment = add_environment
[host::staging*]
TRANSFORMS-environment = add_environment
transforms.conf:
[add_environment]
REGEX = .*
FORMAT = environment::production
DEST_KEY = _meta
Verification:
Search for enriched events:
index=main | stats count by environment
props.conf:
[app_logs]
TRANSFORMS-routing = route_by_severity
transforms.conf:
[route_by_severity]
REGEX = .*SEVERITY=(ERROR|WARN).*$
DEST_KEY = _MetaData:Index
FORMAT = error_logs
Verification:
Search the appropriate indexes for events:
index=error_logs sourcetype=app_logs
Instead of:
.*User:\s+(\w+).*Action:\s+(\w+).*
Use:
User:\s+(\w+)\s+Action:\s+(\w+)
Apply transformations only to specific sourcetypes or hosts in props.conf:
[host::critical_server*]
TRANSFORMS-critical_only = enrich_critical_events
_internal index to monitor transformation performance and identify bottlenecks.Analyze parsing latency:
index=_internal source=*metrics.log group=parsing
Validate configurations in props.conf and transforms.conf:
splunk cmd btool props list --debug
splunk cmd btool transforms list --debug
re module.Check the _internal index for errors:
index=_internal sourcetype=splunkd component=TRANSFORMS
sourcetypes, hosts, or sources.In Splunk, raw data manipulation occurs primarily during the parsing phase and is configured through props.conf and transforms.conf. These manipulations include masking sensitive data, routing events, renaming or enriching fields, and controlling metadata.
This guide expands on advanced configuration use cases, specifically the use of DEST_KEY, the distinction between TRANSFORMS vs. REPORT, and how to prevent field collisions when chaining multiple transformation rules.
DEST_KEY: Behavior and Use CasesThe DEST_KEY setting in transforms.conf defines where the output of a transformation is applied. It is a critical control point that determines whether the manipulation affects raw data, metadata, or field-level structures.
DEST_KEY Options:DEST_KEY |
Purpose | Use Case |
|---|---|---|
_raw |
Replaces the raw event data | Masking credit card numbers, emails, etc. |
_meta |
Injects key-value fields at index time | Static enrichment like region::US |
_MetaData:Index |
Routes data to a specific index | Route "ERROR" logs to error_logs index |
_MetaData:Host |
Overrides the host field | Rewrite host from filename or regex |
_MetaData:Sourcetype |
Changes the sourcetype | Dynamically assign sourcetype based on content |
_MetaData:Source |
Overrides source field | Rewrite source path or label |
Important Notes:
_raw manipulations overwrite original data and are irreversible.
_meta is less commonly known but allows hidden index-time enrichment (useful for tagging).
_MetaData:* keys are for routing or reclassification, executed before indexing.
Though both TRANSFORMS and REPORT are used in props.conf to reference stanzas in transforms.conf, they serve very different purposes and are executed in different phases of the data processing pipeline.
| Attribute | TRANSFORMS-* |
REPORT-* |
|---|---|---|
| Execution Phase | Parsing / Index-time | Search-time |
| Effect Scope | Can modify _raw, metadata, or drop events |
Only extracts fields for search usage |
| Common Use Cases | Masking, routing, host/source override | Extracting fields from structured logs |
| Output Destination | Affects data ingestion or storage | Affects search results only |
| DEST_KEY Required | Yes | No |
TRANSFORMS (masking):
[mask_email]
REGEX = (\w+@\w+\.\w+)
FORMAT = [email protected]
DEST_KEY = _raw
REPORT (extracting fields):
[json_field_extract]
REGEX = "user":"(?P<user>\w+)"
FORMAT = user::$1
Exam Tip: If the question involves modifying how the data is indexed, think TRANSFORMS. If it’s about extracting fields for search/display, think REPORT.
When chaining multiple TRANSFORMS-* stanzas in a single props.conf entry, the order matters, and field collision is a common risk if you don’t properly namespace or structure your extraction logic.
Field overwrites: Multiple transforms might extract or assign the same field with different values.
Metadata clash: Two stanzas try to override host or index based on different criteria.
Unpredictable behavior: If order is not respected, later transforms may negate earlier ones.
Use distinct field names:
Avoid generic names like user or msg in custom extractions; use src_user, web_user, etc.
Order TRANSFORMS-* correctly in props.conf:
Splunk evaluates transforms from left to right:
TRANSFORMS-all = extract_email, mask_sensitive, route_by_severity
Use conditional transforms selectively:
Apply transforms based on host, sourcetype, or filename where applicable to scope logic tightly.
Use _meta for enrichment and avoid search-time confusion:
Example:
FORMAT = log_source::app1 region::APAC
DEST_KEY = _meta
btool:splunk cmd btool transforms list --debug
DEST_KEY Value |
Phase | Used For |
|---|---|---|
_raw |
Index-time | Masking/rewriting raw event |
_meta |
Index-time | Hidden field enrichment |
_MetaData:Index |
Index-time | Routing to a specific index |
_MetaData:Host |
Index-time | Overriding host field |
_MetaData:Sourcetype |
Index-time | Overriding sourcetype |
Which configuration file defines data transformation rules used during indexing?
transforms.conf.
The transforms.conf file contains rules that define how Splunk modifies or routes data during indexing. These rules may include event filtering, field extraction, index routing, or data masking operations. Each transformation is defined as a stanza containing parameters such as regular expressions and destination settings. These transformations are invoked by referencing them in the props.conf file. Understanding how transforms.conf works is essential when administrators need to manipulate incoming data before it is indexed.
Demand Score: 86
Exam Relevance Score: 93
Which configuration file is used to apply transformation rules to specific sourcetypes?
props.conf.
While transforms.conf defines transformation rules, the props.conf file specifies when those transformations should be applied. Administrators reference transformation stanzas from transforms.conf within props.conf using parameters such as TRANSFORMS-<class>. This linkage allows Splunk to apply specific transformations based on sourcetype or other conditions during data processing. This separation of rule definition and rule application allows for flexible configuration management.
Demand Score: 83
Exam Relevance Score: 92
How can Splunk prevent certain events from being indexed?
By configuring transformation rules that drop events.
Splunk can filter out unwanted events during the indexing process by using transformation rules defined in transforms.conf. These rules use regular expressions to identify events that match specific patterns and then apply actions such as dropping the events entirely. The transformation is referenced in props.conf so that it applies to the appropriate sourcetype or data source. This mechanism helps reduce unnecessary data ingestion and can lower storage usage and licensing costs.
Demand Score: 80
Exam Relevance Score: 91
What is the purpose of the SEDCMD setting in Splunk?
To modify raw event data using a regular-expression-based replacement.
SEDCMD allows administrators to perform search-and-replace operations on raw event data before it is indexed. This feature uses a syntax similar to the Unix sed command to match patterns and replace them with alternative values. It is commonly used to mask sensitive information such as passwords or personal identifiers within logs. Because SEDCMD modifies raw data before indexing, the changes affect all future searches involving the affected events.
Demand Score: 78
Exam Relevance Score: 90
How can Splunk route events to different indexes based on event content?
By using transformation rules with regular expressions.
Administrators can configure Splunk to route events to different indexes depending on their content. This is achieved using transformation rules defined in transforms.conf that match specific patterns within the raw event data. When a rule matches, it assigns the event to a specified index. These transformations are applied during the indexing pipeline and referenced through props.conf. This mechanism is commonly used to separate data sources or categorize events into different indexes for organizational or performance purposes.
Demand Score: 76
Exam Relevance Score: 91