Hello everyone, in this blog I am going to tell about Search processing language used by Splunk for efficient searching of machine generated big data. For more information about Splunk, go through my previous blog “Splunk”.
Search Processing Language (SPL) encompasses all the search commands and their functions, arguments, and clauses. Its syntax was originally based on the UNIX pipeline and SQL.
The scope of SPL includes data searching, filtering, modification, manipulation, insertion, and deletion. Search commands tell Splunk Enterprise what to do to the events retrieved from the indexes (where splunk stores the data). For example, usage of a command to filter unwanted information, extract more information, evaluate new fields, calculate statistics, re-order the results, or create a chart.
Some search commands have functions and arguments associated with them. These functions and their arguments can be used to specify how the commands act on the results and/or which fields they act upon. For example, use functions to format the data in a chart, describe what kind of statistics to calculate, and specify what fields to evaluate. Some commands also use clauses to specify how to group the search results.
Splunk supports UNIX style pipeline to do complex searches. The “search pipeline” refers to the structure of a Splunk search, in which consecutive commands are chained together using a pipe character, “|”. The pipe character tells Splunk to use the output/result of one command (to the left of the pipe) as the input for the next command (to the right of the pipe).
Types of searches
As the search begins, different patterns can be recognized and the information that can be useful as searchable fields can be identified. Splunk can be configured to recognize new fields or to create new fields when the new data comes into the indexer. Before start searching you should know what you are trying to accomplish. Generally, after getting data into Splunk, you can:
- Investigate to learn more about the data just indexed or to find the root cause of an issue.
- Summarize the search results into a report, whether tabular or other visualization format.
Depending on this, there can be two types of searches: Raw event searches and transforming searches.
Raw event searches
Raw event searches are searches that just retrieve events from an index, and are typically used when analysing a problem. Some examples of these searches include: checking error codes, correlating events, investigating security issues, and analysing failures. These searches do not usually include search commands (except search, itself), and the results are typically a list of raw events.
Transforming searches are searches that perform some type of statistical calculation against a set of results. These searches will always require fields and at least one of a set of statistical commands. Some examples include: getting a daily count of error events, counting the number of times a specific user has logged in.
Types of search commands
There are four broad categories for all the search commands. They are,
A streaming command operates on each event returned by a search. A distributable streaming command runs on the indexer and can be applied to subsets of indexed data in a parallel manner. For example, the regex command is streaming; it extracts fields and adds them to events at search time. Some of the distributable streaming commands are: convert, eval, extract (kv), fields, lookup (if not local=t), mvexpand, multikv, rename, regex, replace, rex, search, strcat, tags, typer, and where.
A centralized streaming command applies a transformation to each event returned by a search, but unlike distributable streaming commands, it only works on the search head. It can also called “stateful streaming”. Centralized streaming commands include: head, streamstats, some modes of dedup, and some modes of cluster.
A transforming command orders the results into a data table. Transforming commands are not streaming. Also, they are required to transform search result data into the data structures required for visualizations such as column, bar, line, area, and pie charts. Transforming commands include: chart, timechart, stats, top, rare, contingency, highlight, typer, and addtotals.
A generating command is one that fetches information without any transformations. Generating commands are either event-generating (distributable or centralized) or report-generating and, depending on which they are, will return an events list or a table of results.
- Distributable event-generating commands include: search and
- Centralized event-generating commands include: loadjob, inputcsv, and
- Report-generating commands include: dbinspect, datamodel, metadata, pivot, and
5. Other commands
There are a handful of commands that do not fit into these categories. These commands are non-reporting, not distributable, and not streaming: sort, eventstats, some modes of dedup, and some modes of cluster.
The below table covers the most basic commands in SPL for splunk search.
Let’s have a look at usage of some of the basic commands in SPL.
The sort command sorts search results by the specified fields.
The where filtering command evaluates an expression for filtering results. If the evaluation is successful and the result is TRUE, the result is retained; otherwise, the result is discarded.
The top command returns the most frequently occurring field values, along with their count and percentage.
The timechart command creates a chart for a statistical aggregation applied to a field against time as the x-axis.
As we saw in the above, SPL contains handful of commands to search, investigate and analyse machine generated big data. It is an efficient way of getting insights into unpredictable stream of machine data. Using the commands in SPL complex searches can be done in real time.