Splunk’s ability to handle network and scripted inputs allows you to collect real-time data from network devices, APIs, and custom scripts. This guide will cover network inputs, scripted inputs, their configurations, and best practices.
Network inputs allow Splunk to collect real-time data from devices and applications over protocols like Syslog and HTTP Event Collector (HEC).
Edit inputs.conf:
Define a Syslog input over UDP:
[udp://514]
disabled = false
sourcetype = syslog
index = network_logs
Define a Syslog input over TCP:
[tcp://10514]
disabled = false
sourcetype = syslog
index = network_logs
Configure Syslog Devices:
Verify Data:
Use a search query to check for incoming data:
index=network_logs sourcetype=syslog
Enable HEC:
Create a New Token:
api_logsapi_indexapi_dataSend Data to HEC:
Use curl or a similar tool to send data to HEC:
curl -k "https://splunk-server:8088/services/collector/event" \
-H "Authorization: Splunk <hec_token>" \
-d '{"event": "test_event", "sourcetype": "api_data", "index": "api_index"}'
Verify Data:
Search for the event in Splunk:
index=api_index sourcetype=api_data
Scripted inputs allow Splunk to run custom scripts for collecting data from APIs, metrics, or other external systems.
Create the Script:
Save the script in $SPLUNK_HOME/bin/scripts/.
Example: fetch_metrics.py:
import time
import json
# Simulate metric data
metrics = {
"cpu": 55.2,
"memory": 72.5,
"disk": 43.1,
"timestamp": time.time()
}
# Print data in Splunk-compatible format
print(json.dumps(metrics))
Configure inputs.conf:
Define the scripted input:
[script://./bin/scripts/fetch_metrics.py]
disabled = false
interval = 60
sourcetype = system_metrics
index = metrics_index
Restart Splunk:
Apply the configuration:
./splunk restart
Verify Data:
Search for the collected metrics in Splunk:
index=metrics_index sourcetype=system_metrics
Cause:
Solution:
Verify the Syslog device configuration.
Check if Splunk is listening on the specified port:
netstat -tuln | grep 514
Cause:
Solution:
Verify script permissions:
chmod +x $SPLUNK_HOME/bin/scripts/fetch_metrics.py
Test the script manually:
python $SPLUNK_HOME/bin/scripts/fetch_metrics.py
Cause:
Solution:
Validate the HEC token in Splunk Web.
Check HEC logs for errors:
index=_internal sourcetype=splunkd component=HttpEventCollector
Goal: Collect and centralize Syslog messages from devices across multiple branches using a dedicated Syslog server.
Set Up a Syslog Server:
Forward Logs to Splunk:
Configure the Syslog server to forward messages to Splunk:
Example for rsyslog:
*.* @@splunk-server-ip:514
Configure Splunk to Receive Syslog:
Add the following to inputs.conf on the Splunk server:
[tcp://514]
disabled = false
sourcetype = syslog
index = branch_logs
Verify Data:
Run a query to validate ingestion:
index=branch_logs sourcetype=syslog
Monitor and Scale:
Goal: Collect logs from a cloud-based application using HEC in Splunk.
Enable HEC in Splunk:
Generate an HEC Token:
cloud_app_logscloud_logsjson_logs.Configure the Application:
Update the cloud application to send logs to Splunk:
Example using curl:
curl -X POST -H "Authorization: Splunk <hec_token>" \
-d '{"event": "User login", "user": "jdoe"}' \
"https://splunk-server:8088/services/collector/event"
Verify Logs:
Search for events in Splunk:
index=cloud_logs sourcetype=json_logs
Goal: Use a script to collect system performance metrics and ingest them into Splunk.
Write a Python Script:
Save the script as fetch_metrics.py:
import os
import psutil
import json
# Collect metrics
metrics = {
"cpu_percent": psutil.cpu_percent(),
"memory_percent": psutil.virtual_memory().percent,
"disk_usage_percent": psutil.disk_usage('/').percent
}
# Print metrics in JSON format
print(json.dumps(metrics))
Configure inputs.conf:
Add a scripted input:
[script://./bin/scripts/fetch_metrics.py]
disabled = false
interval = 60
sourcetype = system_metrics
index = metrics_index
Test the Script:
Manually run the script and verify output:
python ./bin/scripts/fetch_metrics.py
Restart Splunk:
Restart to activate the input:
./splunk restart
Search for Metrics:
Query collected data in Splunk:
index=metrics_index sourcetype=system_metrics
Goal: Simulate Syslog traffic using logger and verify ingestion in Splunk.
Enable Syslog Input:
Configure inputs.conf:
[udp://514]
disabled = false
sourcetype = syslog
index = syslog_index
Send Test Syslog Messages:
Use logger to send messages:
logger -n <splunk-server-ip> -P 514 "Test message from logger"
Verify Data:
Search for the test message:
index=syslog_index sourcetype=syslog
Goal: Use a script to fetch data from a public API and ingest it into Splunk.
Write the Script:
Example: Fetch weather data from OpenWeatherMap:
import requests
import json
# Fetch weather data
response = requests.get("https://api.openweathermap.org/data/2.5/weather?q=London&appid=your_api_key")
data = response.json()
# Print data in JSON format
print(json.dumps(data))
Configure Scripted Input:
Add the following to inputs.conf:
[script://./bin/scripts/fetch_weather.py]
disabled = false
interval = 600
sourcetype = weather_data
index = api_logs
Restart Splunk:
Apply the configuration:
./splunk restart
Verify Data:
Query the ingested weather data:
index=api_logs sourcetype=weather_data
Cause: Token configuration issues or incorrect permissions.
Solution:
Verify the token configuration in Splunk Web.
Check HEC-related logs:
index=_internal sourcetype=splunkd component=HttpEventCollector
Cause: Script errors or incorrect permissions.
Solution:
Check script execution manually:
python ./bin/scripts/fetch_metrics.py
Ensure the script is executable:
chmod +x ./bin/scripts/fetch_metrics.py
Cause: Port issues or firewall blocking traffic.
Solution:
Verify if Splunk is listening on the port:
netstat -tuln | grep 514
Check firewall rules on both sender and receiver.
Splunk supports ingesting data from external systems via network-based inputs (such as Syslog or HTTP Event Collector) and scripted inputs (custom scripts that emit events). These methods enable real-time or scheduled data collection from systems where installing a forwarder is not feasible.
This guide expands on key considerations and system-level nuances that affect reliability and correctness, especially in production.
Splunk can ingest data directly from network sources such as firewalls, routers, applications, or custom tools.
Syslog messages are commonly sent over:
UDP 514
TCP 514
Privileged ports (<1024) on Linux/Unix systems require root privileges to bind.
Implications:
If you're attempting to configure Splunk to directly listen on UDP 514 or TCP 514:
You must run Splunk as root, which is not recommended for security reasons.
Alternatively, use a Syslog forwarder like rsyslog or syslog-ng to:
Receive data on 514.
Write to a log file such as /var/log/syslog_data.log.
Let Splunk monitor the file instead (via [monitor://] input).
Best Practice:
*.* @localhost:514 # rsyslog config
Then configure:
[monitor:///var/log/syslog_data.log]
sourcetype = syslog
index = syslog_index
HEC allows event-based data ingestion over HTTP/HTTPS and is widely used for:
Cloud application logs
CI/CD pipeline events
API integrations
HEC supports batch mode, which groups multiple events into a single HTTP request to optimize ingestion performance.
Recommended Endpoint:
POST /services/collector/event
Payload Example (Batch Format):
{ "event": "event_1" }
{ "event": "event_2" }
{ "event": "event_3" }
Note: Each JSON object must be on a separate line (newline-delimited), not wrapped in an array.
This format reduces HTTP overhead, especially beneficial in high-throughput environments or API-based data ingestion.
Scripted inputs allow you to collect data by executing external scripts on a schedule.
Supported languages: Python, Bash, PowerShell
Use cases:
API polling
System metric collection
Custom data extraction
Splunk treats each line of script output as a separate event.
Therefore:
Your script should print each event as a separate JSON object on one line.
Avoid multi-line output unless using special parsing configurations.
{
"user": "alice",
"action": "login"
}
{ "user": "alice", "action": "login" }
[script://./bin/scripts/fetch_api_data.py]
interval = 300
sourcetype = api_data
index = api_logs
disabled = false
Ensure the script has execution permissions:
chmod +x fetch_api_data.py
Use Splunk search to validate:
index=api_logs sourcetype=api_data | table _time, user, action
For HEC:
index=hec_index sourcetype=custom_api | stats count by host
| Topic | Recommendation |
|---|---|
| Privileged Ports | Use a Syslog forwarder to avoid running Splunk as root |
| HEC Performance | Batch events into newline-delimited JSON in a single request |
| Script Output Format | Each line = one JSON event; avoid multiline output |
| Permissions | Ensure read/execute permissions for script and log files |
| Monitoring | Use internal logs: index=_internal source=*splunkd.log |
Which network protocol is commonly used by Splunk to receive syslog data?
UDP.
Syslog messages are commonly transmitted using the UDP protocol because it is lightweight and does not require connection establishment. Splunk can listen on specific UDP ports to receive syslog events from network devices such as routers, firewalls, and switches. When configuring a UDP input, administrators specify the listening port and optionally define the sourcetype and index for the incoming data. Although UDP is widely used for syslog, some environments use TCP to provide more reliable delivery and avoid packet loss.
Demand Score: 82
Exam Relevance Score: 91
Which configuration file is used to define network inputs in Splunk?
inputs.conf.
Network inputs are configured within the inputs.conf file. Administrators define stanzas such as [tcp://<port>] or [udp://<port>] to specify the ports that Splunk should listen on for incoming data. Additional parameters such as sourcetype and index determine how the received events are categorized and stored. Proper configuration ensures that incoming network data is correctly processed and searchable once ingested by the indexer.
Demand Score: 79
Exam Relevance Score: 92
What is a scripted input in Splunk?
A data input that runs a script to collect and ingest output as events.
Scripted inputs allow Splunk to execute scripts or commands that produce data output. The script runs at defined intervals and the output generated by the script is indexed as events in Splunk. This method is useful for collecting data from systems that do not provide standard log files or network streams. For example, administrators might use scripts to query APIs, retrieve system metrics, or collect custom application data. Scripted inputs provide flexibility for integrating non-standard data sources into Splunk.
Demand Score: 75
Exam Relevance Score: 90
Where are scripts for scripted inputs typically stored in a Splunk app?
$SPLUNK_HOME/etc/apps/<app_name>/bin.
Scripts used for scripted inputs are typically placed in the bin directory of a Splunk app. This directory is designed to store executable scripts that Splunk can run for data collection or custom processing tasks. When a scripted input is configured in inputs.conf, Splunk executes the script from this directory at the defined interval and indexes the script output. Organizing scripts within the app’s bin directory ensures they are included when the app is deployed or distributed through deployment mechanisms.
Demand Score: 73
Exam Relevance Score: 89