Monitoring inputs is one of the most essential tasks in Splunk, as it allows you to collect and index data from a variety of sources. This guide covers monitorable sources, configuration tips, and performance tuning strategies to optimize data ingestion.
Splunk can monitor different types of data sources, from file directories to network ports and applications.
Splunk can monitor individual files, entire directories, and subdirectories for logs and structured data.
.log), CSVs, JSON files, and configuration files.inputs.conf:Monitor a Single File:
[monitor:///var/log/syslog]
disabled = false
sourcetype = syslog
index = main
Monitor a Directory:
[monitor:///var/log/app/]
disabled = false
sourcetype = app_logs
index = app_index
recursive = true
Filter Files in a Directory:
Use whitelist and blacklist to include or exclude specific files:
[monitor:///var/log/app/]
disabled = false
sourcetype = app_logs
index = app_index
whitelist = \.log$
blacklist = error\.log$
Splunk can listen on TCP and UDP ports for real-time data, such as Syslog messages or HTTP requests.
inputs.conf:Syslog Over UDP:
[udp://514]
disabled = false
sourcetype = syslog
index = syslog_index
Custom Data Over TCP:
[tcp://10514]
disabled = false
sourcetype = custom_data
index = custom_index
Splunk can collect logs from third-party applications, either directly or via a forwarder.
Apache Access Logs:
[monitor:///var/log/apache2/access.log]
disabled = false
sourcetype = access_combined
index = web_logs
MySQL Logs:
[monitor:///var/log/mysql/error.log]
disabled = false
sourcetype = mysql_error_log
index = db_logs
Proper configuration ensures that data is classified correctly and only relevant information is ingested.
Purpose:
Example:
Monitor only .log files and exclude files containing debug:
[monitor:///var/log/app/]
disabled = false
sourcetype = app_logs
index = app_index
whitelist = \.log$
blacklist = debug\.log$
Metadata Fields:
Example:
[monitor:///var/log/app/server.log]
disabled = false
sourcetype = app_server_log
index = app_index
host = app_server_01
btoolPurpose:
inputs.conf settings.Command:
splunk cmd btool inputs list --debug
Efficient monitoring ensures that Splunk can handle large volumes of data without being overwhelmed.
Purpose:
Example:
Monitor a file every 5 minutes:
[monitor:///var/log/large_file.log]
interval = 300
whitelist and blacklist.Tips:
Enable Compression:
Reduce bandwidth for remote monitoring:
[tcpout]
compressed = true
Monitor Internal Logs:
Identify resource-intensive inputs:
index=_internal source=*metrics.log group=per_host_thruput
Test Configurations in Staging:
inputs.conf settings in a staging environment before applying them to production.Use Modular Inputs:
inputs.conf into separate apps for better scalability and management.Regularly Audit Inputs:
Leverage Forwarders:
Goal: Monitor a directory containing logs in different formats and assign appropriate sourcetypes to each file.
Set Up inputs.conf:
Use whitelist to match log file patterns:
[monitor:///var/log/app/]
disabled = false
index = app_logs
[monitor:///var/log/app/]
sourcetype = app_json_logs
whitelist = \.json$
[monitor:///var/log/app/]
sourcetype = app_error_logs
whitelist = error\.log$
Validate Metadata Assignment:
Run a search to verify that logs are categorized correctly:
index=app_logs | stats count by sourcetype
Goal: Configure Splunk to collect real-time Syslog messages from network devices.
Configure inputs.conf for Syslog:
[udp://514]
disabled = false
sourcetype = syslog
index = network_logs
Configure Network Devices:
Test the Setup:
Use a Syslog generator (e.g., logger) to send test messages:
logger -n <splunk_server_ip> -P 514 "Test syslog message"
Verify Data:
Search for the test message in Splunk:
index=network_logs sourcetype=syslog
Goal: Monitor application logs while excluding debug logs to reduce noise.
Use blacklist in inputs.conf:
[monitor:///var/log/app/]
disabled = false
sourcetype = app_logs
index = app_index
blacklist = debug\.log$
Verify Exclusion:
Run a search to confirm no debug logs are ingested:
index=app_index NOT source=*debug.log
Goal: Set up monitoring for a log file that rotates periodically (e.g., access.log, access.log.1).
Configure inputs.conf:
[monitor:///var/log/httpd/access.log]
disabled = false
sourcetype = apache_access
index = web_logs
followTail = true
Restart Splunk:
Apply the configuration:
./splunk restart
Test Rotation:
Simulate log rotation:
mv /var/log/httpd/access.log /var/log/httpd/access.log.1
echo "New log entry" >> /var/log/httpd/access.log
Verify Data:
Search for the new entry in Splunk:
index=web_logs sourcetype=apache_access
Goal: Monitor a directory containing CSV files and ensure Splunk parses the fields correctly.
Configure inputs.conf:
[monitor:///data/csv/]
disabled = false
sourcetype = csv
index = data_index
Define Field Parsing in props.conf:
[csv]
INDEXED_EXTRACTIONS = csv
HEADER_FIELD_DELIMITER = ,
Ingest Sample Data:
/data/csv/.Verify Field Extraction:
Run a search and display extracted fields:
index=data_index | table field1, field2, field3
Cause:
inputs.conf or file permissions.Solution:
Check if the monitored file is accessible:
ls -l /path/to/file.log
Validate inputs.conf using btool:
splunk cmd btool inputs list --debug
Cause:
Solution:
Add a crcSalt value in inputs.conf:
[monitor:///var/log/app/]
crcSalt = <SOURCE>
Use ignoreOlderThan to avoid re-ingesting old logs:
[monitor:///var/log/app/]
ignoreOlderThan = 7d
Cause:
Solution:
Adjust the polling interval for large files:
[monitor:///path/to/large/file.log]
interval = 300
Monitor system performance:
index=_internal source=*metrics.log group=queue
Cause:
TIME_FORMAT in props.conf.Solution:
Update props.conf with the correct format:
[custom_sourcetype]
TIME_FORMAT = %d/%b/%Y:%H:%M:%S
TIME_PREFIX = \[
Test the parsing by ingesting sample data and checking _time values.
Regularly Audit Inputs:
Validate Configurations:
inputs.conf settings in a staging environment before deploying them to production.Use Filters Effectively:
whitelist and blacklist filters.Monitor Resource Usage:
Monitor inputs in Splunk are used to continuously ingest data from files or directories. They are one of the most commonly used data input methods and provide powerful control over how files are read, indexed, and re-read.
This section elaborates on advanced behaviors of monitor inputs, particularly the CRC (Cyclic Redundancy Check) mechanism, initCrcLength, and file permission requirements—which are often misunderstood but important for exam and production success.
CRC (Cyclic Redundancy Check) is a checksum-based mechanism used by Splunk to determine whether a file has already been ingested.
Splunk computes a hash from the first 256 bytes of a file by default.
This hash is used to track ingestion history and avoid re-indexing the same file unintentionally.
If a file is renamed but its content remains the same, Splunk will not ingest it again, since the CRC is unchanged.
This behavior avoids duplicate indexing but can be problematic when content changes slightly.
crcSalt – Forcing Re-ingestion of Renamed FilescrcSalt setting in inputs.conf modifies how the CRC is calculated by adding entropy (a salt) to the calculation.[monitor:///var/log/myapp/]
crcSalt = <SOURCE>
<SOURCE> uses the file path as part of the hash calculation, so:
When rotated logs are renamed (e.g., app.log → app.log.1), but content remains the same and needs re-ingestion.
When you copy the same file to a new path and want Splunk to treat it as new data.
initCrcLength – Handling Large Files with Varying HeadersThis setting limits how much of a file is considered when calculating the CRC.
By default, Splunk uses the first 256 bytes.
If the header changes frequently (e.g., timestamps or app info), but the rest is unchanged, it may cause Splunk to skip re-ingestion.
[monitor:///data/archive/]
initCrcLength = 4096
If you're ingesting large log files where headers (first few lines) change, but the rest does not.
This helps Splunk more accurately detect file uniqueness and avoid false negatives in duplication checks.
Splunk runs under a specific OS user (splunk by default).
That user must have read access to all directories and files it needs to monitor.
Splunk cannot read rotated logs due to:
File ownership by root or app users
Lack of read permissions
Splunk Web shows the monitor input as “enabled,” but no data is ingested.
Use this command to verify permissions:
sudo -u splunk ls -l /var/log/myapp/
Ensure the Splunk process user can:
Traverse directories (x permission)
Read files (r permission)
| Feature | Summary |
|---|---|
| CRC (Default) | Uses the first 256 bytes of the file to avoid duplicate ingestion. |
crcSalt = <SOURCE> |
Forces CRC calculation to include file path; useful for rotated or copied logs. |
initCrcLength |
Expands the CRC hash size to better differentiate files with similar headers. |
| Permissions | Always ensure Splunk user has proper access to input directories and files. |
Application writes logs to /opt/logs/app.log and rotates to /opt/logs/app.log.1, /opt/logs/app.log.2.
Contents are mostly the same.
Splunk is skipping app.log.1.
[monitor:///opt/logs/]
crcSalt = <SOURCE>
initCrcLength = 2048
sourcetype = app_logs
index = prod_index
This forces Splunk to re-ingest rotated logs based on file path and longer CRC range.
splunk _internal call /services/admin/inputstatus/TailingProcessor:FileStatus
This shows files being monitored and their CRC status.
What does a monitor input do in Splunk?
It continuously monitors files or directories for new data to ingest.
A monitor input allows Splunk to watch specified files or directories and ingest new data as it is written. When configured, Splunk tracks the file position using internal metadata so that only new content is indexed. This makes monitor inputs suitable for log files that grow over time, such as application logs or system logs. Splunk periodically checks monitored files for updates and processes any newly appended data. This mechanism ensures efficient ingestion without repeatedly indexing the same content.
Demand Score: 84
Exam Relevance Score: 92
Which configuration file is used to define monitor inputs?
inputs.conf.
Monitor inputs are configured within the inputs.conf configuration file. Each monitored file or directory is defined using a stanza beginning with monitor:// followed by the file path. Administrators can specify additional parameters such as the target index, sourcetype, and host values. These settings determine how the data is categorized and stored once ingested. Proper configuration ensures that Splunk collects the correct data sources while maintaining accurate metadata for searching and analysis.
Demand Score: 80
Exam Relevance Score: 93
What is the primary difference between a monitor input and a batch input?
Monitor inputs track new data continuously, while batch inputs ingest files once and then stop monitoring them.
Monitor inputs are designed for continuously growing files such as logs. Splunk keeps track of the file position and only indexes newly appended data. Batch inputs, on the other hand, process an entire file once and then move or delete it depending on configuration. Batch inputs are typically used for one-time data ingestion scenarios such as importing historical log archives. Understanding this distinction helps administrators select the correct input method based on whether the data source is continuously updated or static.
Demand Score: 74
Exam Relevance Score: 90
How does Splunk avoid re-indexing the same data from monitored files?
By tracking file position information in the fishbucket.
When Splunk monitors files, it records metadata such as file signatures and read offsets in an internal tracking system known as the fishbucket. This information allows Splunk to determine which portions of a file have already been indexed. If Splunk restarts or the file continues to grow, it resumes ingestion from the last recorded position rather than reprocessing the entire file. This mechanism prevents duplicate indexing and ensures efficient log ingestion across restarts and file rotations.
Demand Score: 72
Exam Relevance Score: 91