Shopping cart

Subtotal:

$0.00

SPLK-1003 Network and Scripted Inputs

Network and Scripted Inputs

Detailed list of SPLK-1003 knowledge points

Network and Scripted Inputs Detailed Explanation

Splunk’s ability to handle network and scripted inputs allows you to collect real-time data from network devices, APIs, and custom scripts. This guide will cover network inputs, scripted inputs, their configurations, and best practices.

1. Network Inputs

Network inputs allow Splunk to collect real-time data from devices and applications over protocols like Syslog and HTTP Event Collector (HEC).

1.1 Syslog Inputs

Overview:
  • Syslog is a standard protocol used by devices like routers, switches, firewalls, and servers to send log messages.
  • Splunk can receive Syslog messages over UDP or TCP.
Steps to Configure Syslog Inputs:
  1. Edit inputs.conf:

    • Define a Syslog input over UDP:

      [udp://514]
      disabled = false
      sourcetype = syslog
      index = network_logs
      
    • Define a Syslog input over TCP:

      [tcp://10514]
      disabled = false
      sourcetype = syslog
      index = network_logs
      
  2. Configure Syslog Devices:

    • Point Syslog devices (e.g., firewalls) to the Splunk server’s IP address and port.
  3. Verify Data:

    • Use a search query to check for incoming data:

      index=network_logs sourcetype=syslog
      
Best Practices for Syslog Inputs:
  1. Use TCP for Reliability:
    • UDP does not guarantee message delivery, so use TCP for critical data.
  2. Set Up a Dedicated Syslog Server:
    • Use a separate Syslog server (e.g., rsyslog or syslog-ng) to collect and forward logs to Splunk for better reliability and control.
  3. Enable High Availability:
    • Use load balancers to distribute Syslog traffic across multiple Splunk instances.

1.2 HTTP Event Collector (HEC)

Overview:
  • The HTTP Event Collector (HEC) allows Splunk to receive data via HTTP or HTTPS.
  • It is ideal for collecting logs from cloud services, custom applications, and APIs.
Steps to Configure HEC:
  1. Enable HEC:

    • Navigate to Settings > Data Inputs > HTTP Event Collector in Splunk Web.
    • Click Global Settings and enable HEC.
    • Set the port (default: 8088).
  2. Create a New Token:

    • Click New Token and configure:
      • Name: api_logs
      • Index: api_index
      • Sourcetype: api_data
  3. Send Data to HEC:

    • Use curl or a similar tool to send data to HEC:

      curl -k "https://splunk-server:8088/services/collector/event" \
      -H "Authorization: Splunk <hec_token>" \
      -d '{"event": "test_event", "sourcetype": "api_data", "index": "api_index"}'
      
  4. Verify Data:

    • Search for the event in Splunk:

      index=api_index sourcetype=api_data
      
Best Practices for HEC:
  1. Secure HEC with HTTPS:
    • Use SSL/TLS for secure data transmission.
  2. Monitor HEC Tokens:
    • Periodically review and rotate tokens for security.
  3. Enable Throttling:
    • Use rate limits to prevent overload during spikes in data volume.

2. Scripted Inputs

Scripted inputs allow Splunk to run custom scripts for collecting data from APIs, metrics, or other external systems.

2.1 Use Cases

  • External API Data:
    • Fetch data from services like weather APIs or stock prices.
  • Custom Logs:
    • Generate logs dynamically based on specific application logic.
  • System Metrics:
    • Collect CPU, memory, and disk usage from servers.

2.2 Steps to Configure Scripted Inputs

  1. Create the Script:

    • Save the script in $SPLUNK_HOME/bin/scripts/.

    • Example: fetch_metrics.py:

      import time
      import json
      
      # Simulate metric data
      metrics = {
         "cpu": 55.2,
         "memory": 72.5,
         "disk": 43.1,
         "timestamp": time.time()
      }
      
      # Print data in Splunk-compatible format
      print(json.dumps(metrics))
      
  2. Configure inputs.conf:

    • Define the scripted input:

      [script://./bin/scripts/fetch_metrics.py]
      disabled = false
      interval = 60
      sourcetype = system_metrics
      index = metrics_index
      
  3. Restart Splunk:

    • Apply the configuration:

      ./splunk restart
      
  4. Verify Data:

    • Search for the collected metrics in Splunk:

      index=metrics_index sourcetype=system_metrics
      

2.3 Best Practices for Scripted Inputs

  1. Optimize Script Efficiency:
    • Ensure scripts are lightweight to avoid high CPU or memory usage.
  2. Set Appropriate Intervals:
    • Use longer intervals for scripts that collect static or low-frequency data.
  3. Log Script Errors:
    • Redirect script errors to a log file for troubleshooting.

3. Advanced Troubleshooting

3.1 Issue: No Data from Syslog Input

  • Cause:

    • Incorrect port or firewall blocking traffic.
  • Solution:

    1. Verify the Syslog device configuration.

    2. Check if Splunk is listening on the specified port:

      netstat -tuln | grep 514
      

3.2 Issue: Scripted Input Not Running

  • Cause:

    • Incorrect file permissions or syntax errors in the script.
  • Solution:

    1. Verify script permissions:

      chmod +x $SPLUNK_HOME/bin/scripts/fetch_metrics.py
      
    2. Test the script manually:

      python $SPLUNK_HOME/bin/scripts/fetch_metrics.py
      

3.3 Issue: HEC Data Not Ingested

  • Cause:

    • Incorrect HEC token or server settings.
  • Solution:

    1. Validate the HEC token in Splunk Web.

    2. Check HEC logs for errors:

      index=_internal sourcetype=splunkd component=HttpEventCollector
      

4. Best Practices Recap

  1. Syslog Inputs:
    • Use TCP for reliability and consider a dedicated Syslog server for large-scale deployments.
  2. HEC:
    • Secure data transmission with HTTPS and rotate tokens regularly.
  3. Scripted Inputs:
    • Keep scripts efficient, log errors, and test in staging before production.
  4. Monitor Performance:
    • Use the Monitoring Console to track input performance and resource usage.

Real-World Scenarios

Scenario 1: Centralized Syslog Collection for a Multi-Branch Organization

Goal: Collect and centralize Syslog messages from devices across multiple branches using a dedicated Syslog server.

Approach:
  1. Set Up a Syslog Server:

    • Use a dedicated Syslog server like rsyslog or syslog-ng to collect logs from branch devices.
  2. Forward Logs to Splunk:

    • Configure the Syslog server to forward messages to Splunk:

      • Example for rsyslog:

        *.* @@splunk-server-ip:514
        
  3. Configure Splunk to Receive Syslog:

    • Add the following to inputs.conf on the Splunk server:

      [tcp://514]
      disabled = false
      sourcetype = syslog
      index = branch_logs
      
  4. Verify Data:

    • Run a query to validate ingestion:

      index=branch_logs sourcetype=syslog
      
  5. Monitor and Scale:

    • Use a load balancer if log volume grows:
      • Configure the load balancer to distribute traffic across multiple Splunk servers.

Scenario 2: Using HEC for Cloud-Based Application Logs

Goal: Collect logs from a cloud-based application using HEC in Splunk.

Approach:
  1. Enable HEC in Splunk:

    • Go to Settings > Data Inputs > HTTP Event Collector.
    • Enable HEC and set the port (default: 8088).
  2. Generate an HEC Token:

    • Create a token for the cloud application:
      • Name: cloud_app_logs
      • Index: cloud_logs
      • Sourcetype: json_logs.
  3. Configure the Application:

    • Update the cloud application to send logs to Splunk:

      • Example using curl:

        curl -X POST -H "Authorization: Splunk <hec_token>" \
        -d '{"event": "User login", "user": "jdoe"}' \
        "https://splunk-server:8088/services/collector/event"
        
  4. Verify Logs:

    • Search for events in Splunk:

      index=cloud_logs sourcetype=json_logs
      

Scenario 3: Using Scripted Inputs for Custom Metrics

Goal: Use a script to collect system performance metrics and ingest them into Splunk.

Approach:
  1. Write a Python Script:

    • Save the script as fetch_metrics.py:

      import os
      import psutil
      import json
      
      # Collect metrics
      metrics = {
         "cpu_percent": psutil.cpu_percent(),
         "memory_percent": psutil.virtual_memory().percent,
         "disk_usage_percent": psutil.disk_usage('/').percent
      }
      
      # Print metrics in JSON format
      print(json.dumps(metrics))
      
  2. Configure inputs.conf:

    • Add a scripted input:

      [script://./bin/scripts/fetch_metrics.py]
      disabled = false
      interval = 60
      sourcetype = system_metrics
      index = metrics_index
      
  3. Test the Script:

    • Manually run the script and verify output:

      python ./bin/scripts/fetch_metrics.py
      
  4. Restart Splunk:

    • Restart to activate the input:

      ./splunk restart
      
  5. Search for Metrics:

    • Query collected data in Splunk:

      index=metrics_index sourcetype=system_metrics
      

Hands-On Exercises

Exercise 1: Simulate Syslog Traffic

Goal: Simulate Syslog traffic using logger and verify ingestion in Splunk.

Steps:
  1. Enable Syslog Input:

    • Configure inputs.conf:

      [udp://514]
      disabled = false
      sourcetype = syslog
      index = syslog_index
      
  2. Send Test Syslog Messages:

    • Use logger to send messages:

      logger -n <splunk-server-ip> -P 514 "Test message from logger"
      
  3. Verify Data:

    • Search for the test message:

      index=syslog_index sourcetype=syslog
      

Exercise 2: Collect Data from a Public API

Goal: Use a script to fetch data from a public API and ingest it into Splunk.

Steps:
  1. Write the Script:

    • Example: Fetch weather data from OpenWeatherMap:

      import requests
      import json
      
      # Fetch weather data
      response = requests.get("https://api.openweathermap.org/data/2.5/weather?q=London&appid=your_api_key")
      data = response.json()
      
      # Print data in JSON format
      print(json.dumps(data))
      
  2. Configure Scripted Input:

    • Add the following to inputs.conf:

      [script://./bin/scripts/fetch_weather.py]
      disabled = false
      interval = 600
      sourcetype = weather_data
      index = api_logs
      
  3. Restart Splunk:

    • Apply the configuration:

      ./splunk restart
      
  4. Verify Data:

    • Query the ingested weather data:

      index=api_logs sourcetype=weather_data
      

Advanced Troubleshooting

Issue: HEC Token Not Working

  • Cause: Token configuration issues or incorrect permissions.

  • Solution:

    1. Verify the token configuration in Splunk Web.

    2. Check HEC-related logs:

      index=_internal sourcetype=splunkd component=HttpEventCollector
      

Issue: Scripted Input Fails to Execute

  • Cause: Script errors or incorrect permissions.

  • Solution:

    1. Check script execution manually:

      python ./bin/scripts/fetch_metrics.py
      
    2. Ensure the script is executable:

      chmod +x ./bin/scripts/fetch_metrics.py
      

Issue: Syslog Traffic Not Received

  • Cause: Port issues or firewall blocking traffic.

  • Solution:

    1. Verify if Splunk is listening on the port:

      netstat -tuln | grep 514
      
    2. Check firewall rules on both sender and receiver.

Best Practices Recap

  1. Syslog Inputs:
    • Use TCP for critical logs and implement load balancers for high availability.
  2. HEC:
    • Secure HEC endpoints with SSL/TLS and monitor token usage.
  3. Scripted Inputs:
    • Optimize scripts for efficiency, and use longer intervals for low-frequency data.

Network and Scripted Inputs (Additional Content)

Splunk supports ingesting data from external systems via network-based inputs (such as Syslog or HTTP Event Collector) and scripted inputs (custom scripts that emit events). These methods enable real-time or scheduled data collection from systems where installing a forwarder is not feasible.

This guide expands on key considerations and system-level nuances that affect reliability and correctness, especially in production.

1. Network Inputs

Splunk can ingest data directly from network sources such as firewalls, routers, applications, or custom tools.

1.1 Syslog Inputs – Port Privilege Considerations

Syslog messages are commonly sent over:

  • UDP 514

  • TCP 514

Important Note – Port Binding Requirements:

Privileged ports (<1024) on Linux/Unix systems require root privileges to bind.

Implications:

  • If you're attempting to configure Splunk to directly listen on UDP 514 or TCP 514:

    • You must run Splunk as root, which is not recommended for security reasons.

    • Alternatively, use a Syslog forwarder like rsyslog or syslog-ng to:

      • Receive data on 514.

      • Write to a log file such as /var/log/syslog_data.log.

      • Let Splunk monitor the file instead (via [monitor://] input).

Best Practice:

*.* @localhost:514  # rsyslog config

Then configure:

[monitor:///var/log/syslog_data.log]
sourcetype = syslog
index = syslog_index

1.2 HTTP Event Collector (HEC)

HEC allows event-based data ingestion over HTTP/HTTPS and is widely used for:

  • Cloud application logs

  • CI/CD pipeline events

  • API integrations

Batch Event Submission – Performance Tip

HEC supports batch mode, which groups multiple events into a single HTTP request to optimize ingestion performance.

Recommended Endpoint:

POST /services/collector/event

Payload Example (Batch Format):

{ "event": "event_1" }
{ "event": "event_2" }
{ "event": "event_3" }

Note: Each JSON object must be on a separate line (newline-delimited), not wrapped in an array.

This format reduces HTTP overhead, especially beneficial in high-throughput environments or API-based data ingestion.

2. Scripted Inputs

Scripted inputs allow you to collect data by executing external scripts on a schedule.

  • Supported languages: Python, Bash, PowerShell

  • Use cases:

    • API polling

    • System metric collection

    • Custom data extraction

2.1 Output Format – One Event per Line (Important)

Splunk treats each line of script output as a separate event.

Therefore:

  • Your script should print each event as a separate JSON object on one line.

  • Avoid multi-line output unless using special parsing configurations.

Incorrect Output (will be ingested as one event):
{
  "user": "alice",
  "action": "login"
}
Correct Output (one line per event):
{ "user": "alice", "action": "login" }

2.2 Script Configuration in inputs.conf

[script://./bin/scripts/fetch_api_data.py]
interval = 300
sourcetype = api_data
index = api_logs
disabled = false

Ensure the script has execution permissions:

chmod +x fetch_api_data.py

3. Verification and Troubleshooting

Use Splunk search to validate:

index=api_logs sourcetype=api_data | table _time, user, action

For HEC:

index=hec_index sourcetype=custom_api | stats count by host

4. Best Practices Summary

Topic Recommendation
Privileged Ports Use a Syslog forwarder to avoid running Splunk as root
HEC Performance Batch events into newline-delimited JSON in a single request
Script Output Format Each line = one JSON event; avoid multiline output
Permissions Ensure read/execute permissions for script and log files
Monitoring Use internal logs: index=_internal source=*splunkd.log

Frequently Asked Questions

Which network protocol is commonly used by Splunk to receive syslog data?

Answer:

UDP.

Explanation:

Syslog messages are commonly transmitted using the UDP protocol because it is lightweight and does not require connection establishment. Splunk can listen on specific UDP ports to receive syslog events from network devices such as routers, firewalls, and switches. When configuring a UDP input, administrators specify the listening port and optionally define the sourcetype and index for the incoming data. Although UDP is widely used for syslog, some environments use TCP to provide more reliable delivery and avoid packet loss.

Demand Score: 82

Exam Relevance Score: 91

Which configuration file is used to define network inputs in Splunk?

Answer:

inputs.conf.

Explanation:

Network inputs are configured within the inputs.conf file. Administrators define stanzas such as [tcp://<port>] or [udp://<port>] to specify the ports that Splunk should listen on for incoming data. Additional parameters such as sourcetype and index determine how the received events are categorized and stored. Proper configuration ensures that incoming network data is correctly processed and searchable once ingested by the indexer.

Demand Score: 79

Exam Relevance Score: 92

What is a scripted input in Splunk?

Answer:

A data input that runs a script to collect and ingest output as events.

Explanation:

Scripted inputs allow Splunk to execute scripts or commands that produce data output. The script runs at defined intervals and the output generated by the script is indexed as events in Splunk. This method is useful for collecting data from systems that do not provide standard log files or network streams. For example, administrators might use scripts to query APIs, retrieve system metrics, or collect custom application data. Scripted inputs provide flexibility for integrating non-standard data sources into Splunk.

Demand Score: 75

Exam Relevance Score: 90

Where are scripts for scripted inputs typically stored in a Splunk app?

Answer:

$SPLUNK_HOME/etc/apps/<app_name>/bin.

Explanation:

Scripts used for scripted inputs are typically placed in the bin directory of a Splunk app. This directory is designed to store executable scripts that Splunk can run for data collection or custom processing tasks. When a scripted input is configured in inputs.conf, Splunk executes the script from this directory at the defined interval and indexes the script output. Organizing scripts within the app’s bin directory ensures they are included when the app is deployed or distributed through deployment mechanisms.

Demand Score: 73

Exam Relevance Score: 89

SPLK-1003 Training Course