Network and Scripted Inputs

Network and Scripted Inputs Detailed Explanation

Splunk’s ability to handle network and scripted inputs allows you to collect real-time data from network devices, APIs, and custom scripts. This guide will cover network inputs, scripted inputs, their configurations, and best practices.

1. Network Inputs

Network inputs allow Splunk to collect real-time data from devices and applications over protocols like Syslog and HTTP Event Collector (HEC).

1.1 Syslog Inputs

Overview:

Syslog is a standard protocol used by devices like routers, switches, firewalls, and servers to send log messages.
Splunk can receive Syslog messages over UDP or TCP.

Steps to Configure Syslog Inputs:

Edit inputs.conf:

Define a Syslog input over UDP:

[udp://514]
disabled = false
sourcetype = syslog
index = network_logs

Define a Syslog input over TCP:

[tcp://10514]
disabled = false
sourcetype = syslog
index = network_logs

Configure Syslog Devices:
- Point Syslog devices (e.g., firewalls) to the Splunk server’s IP address and port.
Verify Data:
- Use a search query to check for incoming data:
```
index=network_logs sourcetype=syslog
```

Best Practices for Syslog Inputs:

Use TCP for Reliability:
- UDP does not guarantee message delivery, so use TCP for critical data.
Set Up a Dedicated Syslog Server:
- Use a separate Syslog server (e.g., rsyslog or syslog-ng) to collect and forward logs to Splunk for better reliability and control.
Enable High Availability:
- Use load balancers to distribute Syslog traffic across multiple Splunk instances.

1.2 HTTP Event Collector (HEC)

Overview:

The HTTP Event Collector (HEC) allows Splunk to receive data via HTTP or HTTPS.
It is ideal for collecting logs from cloud services, custom applications, and APIs.

Steps to Configure HEC:

Enable HEC:
- Navigate to Settings > Data Inputs > HTTP Event Collector in Splunk Web.
- Click Global Settings and enable HEC.
- Set the port (default: 8088).
Create a New Token:
- Click New Token and configure:
  - Name: api_logs
  - Index: api_index
  - Sourcetype: api_data

Send Data to HEC:

Use curl or a similar tool to send data to HEC:

curl -k "https://splunk-server:8088/services/collector/event" \
-H "Authorization: Splunk <hec_token>" \
-d '{"event": "test_event", "sourcetype": "api_data", "index": "api_index"}'

Verify Data:
- Search for the event in Splunk:
```
index=api_index sourcetype=api_data
```

Best Practices for HEC:

Secure HEC with HTTPS:
- Use SSL/TLS for secure data transmission.
Monitor HEC Tokens:
- Periodically review and rotate tokens for security.
Enable Throttling:
- Use rate limits to prevent overload during spikes in data volume.

2. Scripted Inputs

Scripted inputs allow Splunk to run custom scripts for collecting data from APIs, metrics, or other external systems.

2.1 Use Cases

External API Data:
- Fetch data from services like weather APIs or stock prices.
Custom Logs:
- Generate logs dynamically based on specific application logic.
System Metrics:
- Collect CPU, memory, and disk usage from servers.

2.2 Steps to Configure Scripted Inputs

Create the Script:

Save the script in $SPLUNK_HOME/bin/scripts/.

Example: fetch_metrics.py:

import time
import json

# Simulate metric data
metrics = {
   "cpu": 55.2,
   "memory": 72.5,
   "disk": 43.1,
   "timestamp": time.time()
}

# Print data in Splunk-compatible format
print(json.dumps(metrics))

Configure inputs.conf:

Define the scripted input:

[script://./bin/scripts/fetch_metrics.py]
disabled = false
interval = 60
sourcetype = system_metrics
index = metrics_index

Restart Splunk:
- Apply the configuration:
```
./splunk restart
```
Verify Data:
- Search for the collected metrics in Splunk:
```
index=metrics_index sourcetype=system_metrics
```

2.3 Best Practices for Scripted Inputs

Optimize Script Efficiency:
- Ensure scripts are lightweight to avoid high CPU or memory usage.
Set Appropriate Intervals:
- Use longer intervals for scripts that collect static or low-frequency data.
Log Script Errors:
- Redirect script errors to a log file for troubleshooting.

3. Advanced Troubleshooting

3.1 Issue: No Data from Syslog Input

Cause:
- Incorrect port or firewall blocking traffic.
Solution:
1. Verify the Syslog device configuration.
2. Check if Splunk is listening on the specified port:
```
netstat -tuln | grep 514
```

3.2 Issue: Scripted Input Not Running

Cause:
- Incorrect file permissions or syntax errors in the script.

Solution:

Verify script permissions:

chmod +x $SPLUNK_HOME/bin/scripts/fetch_metrics.py

Test the script manually:

python $SPLUNK_HOME/bin/scripts/fetch_metrics.py

3.3 Issue: HEC Data Not Ingested

Cause:
- Incorrect HEC token or server settings.
Solution:
1. Validate the HEC token in Splunk Web.
2. Check HEC logs for errors:
```
index=_internal sourcetype=splunkd component=HttpEventCollector
```

4. Best Practices Recap

Syslog Inputs:
- Use TCP for reliability and consider a dedicated Syslog server for large-scale deployments.
HEC:
- Secure data transmission with HTTPS and rotate tokens regularly.
Scripted Inputs:
- Keep scripts efficient, log errors, and test in staging before production.
Monitor Performance:
- Use the Monitoring Console to track input performance and resource usage.

Real-World Scenarios

Scenario 1: Centralized Syslog Collection for a Multi-Branch Organization

Goal: Collect and centralize Syslog messages from devices across multiple branches using a dedicated Syslog server.

Approach:

Set Up a Syslog Server:
- Use a dedicated Syslog server like rsyslog or syslog-ng to collect logs from branch devices.
Forward Logs to Splunk:
- Configure the Syslog server to forward messages to Splunk:
  - Example for rsyslog:
```
*.* @@splunk-server-ip:514
```
Configure Splunk to Receive Syslog:
- Add the following to inputs.conf on the Splunk server:
```
[tcp://514]
disabled = false
sourcetype = syslog
index = branch_logs
```
Verify Data:
- Run a query to validate ingestion:
```
index=branch_logs sourcetype=syslog
```
Monitor and Scale:
- Use a load balancer if log volume grows:
  - Configure the load balancer to distribute traffic across multiple Splunk servers.

Scenario 2: Using HEC for Cloud-Based Application Logs

Goal: Collect logs from a cloud-based application using HEC in Splunk.

Approach:

Enable HEC in Splunk:
- Go to Settings > Data Inputs > HTTP Event Collector.
- Enable HEC and set the port (default: 8088).
Generate an HEC Token:
- Create a token for the cloud application:
  - Name: cloud_app_logs
  - Index: cloud_logs
  - Sourcetype: json_logs.

Configure the Application:

Update the cloud application to send logs to Splunk:

Example using curl:

curl -X POST -H "Authorization: Splunk <hec_token>" \
-d '{"event": "User login", "user": "jdoe"}' \
"https://splunk-server:8088/services/collector/event"

Verify Logs:
- Search for events in Splunk:
```
index=cloud_logs sourcetype=json_logs
```

Scenario 3: Using Scripted Inputs for Custom Metrics

Goal: Use a script to collect system performance metrics and ingest them into Splunk.

Approach:

Write a Python Script:

Save the script as fetch_metrics.py:

import os
import psutil
import json

# Collect metrics
metrics = {
   "cpu_percent": psutil.cpu_percent(),
   "memory_percent": psutil.virtual_memory().percent,
   "disk_usage_percent": psutil.disk_usage('/').percent
}

# Print metrics in JSON format
print(json.dumps(metrics))

Configure inputs.conf:

Add a scripted input:

[script://./bin/scripts/fetch_metrics.py]
disabled = false
interval = 60
sourcetype = system_metrics
index = metrics_index

Test the Script:
- Manually run the script and verify output:
```
python ./bin/scripts/fetch_metrics.py
```
Restart Splunk:
- Restart to activate the input:
```
./splunk restart
```
Search for Metrics:
- Query collected data in Splunk:
```
index=metrics_index sourcetype=system_metrics
```

Hands-On Exercises

Exercise 1: Simulate Syslog Traffic

Goal: Simulate Syslog traffic using logger and verify ingestion in Splunk.

Steps:

Enable Syslog Input:

Configure inputs.conf:

[udp://514]
disabled = false
sourcetype = syslog
index = syslog_index

Send Test Syslog Messages:

Use logger to send messages:

logger -n <splunk-server-ip> -P 514 "Test message from logger"

Verify Data:
- Search for the test message:
```
index=syslog_index sourcetype=syslog
```

Exercise 2: Collect Data from a Public API

Goal: Use a script to fetch data from a public API and ingest it into Splunk.

Steps:

Write the Script:

Example: Fetch weather data from OpenWeatherMap:

import requests
import json

# Fetch weather data
response = requests.get("https://api.openweathermap.org/data/2.5/weather?q=London&appid=your_api_key")
data = response.json()

# Print data in JSON format
print(json.dumps(data))

Configure Scripted Input:

Add the following to inputs.conf:

[script://./bin/scripts/fetch_weather.py]
disabled = false
interval = 600
sourcetype = weather_data
index = api_logs

Restart Splunk:
- Apply the configuration:
```
./splunk restart
```
Verify Data:
- Query the ingested weather data:
```
index=api_logs sourcetype=weather_data
```

Advanced Troubleshooting

Issue: HEC Token Not Working

Cause: Token configuration issues or incorrect permissions.
Solution:
1. Verify the token configuration in Splunk Web.
2. Check HEC-related logs:
```
index=_internal sourcetype=splunkd component=HttpEventCollector
```

Issue: Scripted Input Fails to Execute

Cause: Script errors or incorrect permissions.

Solution:

Check script execution manually:
```
python ./bin/scripts/fetch_metrics.py
```

Ensure the script is executable:

chmod +x ./bin/scripts/fetch_metrics.py

Issue: Syslog Traffic Not Received

Cause: Port issues or firewall blocking traffic.
Solution:
1. Verify if Splunk is listening on the port:
```
netstat -tuln | grep 514
```
2. Check firewall rules on both sender and receiver.

Best Practices Recap

Syslog Inputs:
- Use TCP for critical logs and implement load balancers for high availability.
HEC:
- Secure HEC endpoints with SSL/TLS and monitor token usage.
Scripted Inputs:
- Optimize scripts for efficiency, and use longer intervals for low-frequency data.

Network and Scripted Inputs (Additional Content)

Splunk supports ingesting data from external systems via network-based inputs (such as Syslog or HTTP Event Collector) and scripted inputs (custom scripts that emit events). These methods enable real-time or scheduled data collection from systems where installing a forwarder is not feasible.

This guide expands on key considerations and system-level nuances that affect reliability and correctness, especially in production.

1. Network Inputs

Splunk can ingest data directly from network sources such as firewalls, routers, applications, or custom tools.

1.1 Syslog Inputs – Port Privilege Considerations

Syslog messages are commonly sent over:

UDP 514
TCP 514

Important Note – Port Binding Requirements:

Privileged ports (<1024) on Linux/Unix systems require root privileges to bind.

Implications:

If you're attempting to configure Splunk to directly listen on UDP 514 or TCP 514:
- You must run Splunk as root, which is not recommended for security reasons.
- Alternatively, use a Syslog forwarder like rsyslog or syslog-ng to:
  - Receive data on 514.
  - Write to a log file such as /var/log/syslog_data.log.
  - Let Splunk monitor the file instead (via [monitor://] input).

Best Practice:

*.* @localhost:514  # rsyslog config

Then configure:

[monitor:///var/log/syslog_data.log]
sourcetype = syslog
index = syslog_index

1.2 HTTP Event Collector (HEC)

HEC allows event-based data ingestion over HTTP/HTTPS and is widely used for:

Cloud application logs
CI/CD pipeline events
API integrations

Batch Event Submission – Performance Tip

HEC supports batch mode, which groups multiple events into a single HTTP request to optimize ingestion performance.

Recommended Endpoint:

POST /services/collector/event

Payload Example (Batch Format):

{ "event": "event_1" }
{ "event": "event_2" }
{ "event": "event_3" }

Note: Each JSON object must be on a separate line (newline-delimited), not wrapped in an array.

This format reduces HTTP overhead, especially beneficial in high-throughput environments or API-based data ingestion.

2. Scripted Inputs

Scripted inputs allow you to collect data by executing external scripts on a schedule.

Supported languages: Python, Bash, PowerShell
Use cases:
- API polling
- System metric collection
- Custom data extraction

2.1 Output Format – One Event per Line (Important)

Splunk treats each line of script output as a separate event.

Therefore:

Your script should print each event as a separate JSON object on one line.
Avoid multi-line output unless using special parsing configurations.

Incorrect Output (will be ingested as one event):

{
  "user": "alice",
  "action": "login"
}

Correct Output (one line per event):

{ "user": "alice", "action": "login" }

2.2 Script Configuration in inputs.conf

[script://./bin/scripts/fetch_api_data.py]
interval = 300
sourcetype = api_data
index = api_logs
disabled = false

Ensure the script has execution permissions:

chmod +x fetch_api_data.py

3. Verification and Troubleshooting

Use Splunk search to validate:

index=api_logs sourcetype=api_data | table _time, user, action

For HEC:

index=hec_index sourcetype=custom_api | stats count by host

4. Best Practices Summary

Topic	Recommendation
Privileged Ports	Use a Syslog forwarder to avoid running Splunk as root
HEC Performance	Batch events into newline-delimited JSON in a single request
Script Output Format	Each line = one JSON event; avoid multiline output
Permissions	Ensure read/execute permissions for script and log files
Monitoring	Use internal logs: `index=_internal source=*splunkd.log`

Shopping cart

Subtotal:

SPLK-1003 Network and Scripted Inputs

Detailed list of SPLK-1003 knowledge points

Network and Scripted Inputs Detailed Explanation

1. Network Inputs

1.1 Syslog Inputs

Overview:

Steps to Configure Syslog Inputs:

Best Practices for Syslog Inputs:

1.2 HTTP Event Collector (HEC)

Overview:

Steps to Configure HEC:

Best Practices for HEC:

2. Scripted Inputs

2.1 Use Cases

2.2 Steps to Configure Scripted Inputs

2.3 Best Practices for Scripted Inputs

3. Advanced Troubleshooting

3.1 Issue: No Data from Syslog Input

3.2 Issue: Scripted Input Not Running

3.3 Issue: HEC Data Not Ingested

4. Best Practices Recap

Real-World Scenarios

Scenario 1: Centralized Syslog Collection for a Multi-Branch Organization

Approach:

Scenario 2: Using HEC for Cloud-Based Application Logs

Approach:

Scenario 3: Using Scripted Inputs for Custom Metrics

Approach:

Hands-On Exercises

Exercise 1: Simulate Syslog Traffic

Steps:

Exercise 2: Collect Data from a Public API

Steps:

Advanced Troubleshooting

Issue: HEC Token Not Working

Issue: Scripted Input Fails to Execute

Issue: Syslog Traffic Not Received

Best Practices Recap

Network and Scripted Inputs (Additional Content)

1. Network Inputs

1.1 Syslog Inputs – Port Privilege Considerations

Important Note – Port Binding Requirements:

1.2 HTTP Event Collector (HEC)

Batch Event Submission – Performance Tip

2. Scripted Inputs

2.1 Output Format – One Event per Line (Important)

Incorrect Output (will be ingested as one event):

Correct Output (one line per event):

2.2 Script Configuration in inputs.conf

3. Verification and Troubleshooting

4. Best Practices Summary

Frequently Asked Questions