Using search terms effectively

The key to creating an effective search is to take advantage of the index. The Splunk index is effectively a huge word index, sliced by time. One of the most important factors for the performance of your searches is how many events are pulled from the disk. The following few key points should be committed to memory:

  • Search terms are case insensitive: Searches for error, Error, ERROR, and ErRoR are all the same.
  • Search terms are additive: Given the search item mary error, only events that contain both words will be found. There are Boolean and grouping operators to change this behavior; we will discuss these later.
  • Only the time frame specified is queried: This may seem obvious, but it's very different from a database, which would always have a single index across all events in a table. Since each index is sliced into new buckets over time, only the buckets that contain events for the time frame in question need to be queried.
  • Search terms are words, including parts of words: A search for foo will also match foobar.

With just these concepts, you can write fairly effective searches. Let's dig a little deeper, though:

  • A word is anything surrounded by whitespace or punctuation (and sometimes a split of words): For instance, given the log line 2012-02-07T01:03:31.104-0600 INFO AuthClass Hello world. [user=Bobby, ip=1.2.3.3], the words indexed are 2012, 02, 07T01, 03, 31, 104, 0600, INFO, AuthClass, Hello, world, user, Bobby, ip, 1, 2, 3, and 3. This may seem strange, and possibly a bit wasteful, but this is what Splunk's index is really, really good at—dealing with huge numbers of words across huge numbers of events.
  • Splunk is not grep with an interface: One of the most common questions is whether Splunk uses regular expressions for your searches. Technically, the answer is no. Splunk does use regex internally to extract fields, including autogenerated fields, but most of what you would do with regular expressions is available in other ways. Using the index as it is designed is the best way to build fast searches. Regular expressions can then be used to further filter results or extract fields.
  • Numbers are not numbers until after they have been parsed at search time: This means that searching for foo>5 will not use the index, as the value of foo is not known until it has been parsed out of the event at search time. There are different ways to deal with this behavior depending on the question you're trying to answer.
  • Field names are case sensitive: When searching for host=myhost, host must be lowercase. Likewise, any extracted or configured fields have case-sensitive field names, but the values are case insensitive:
    • Host=myhost will not work
    • host=myhost will work
    • host=MyHost will work
  • Fields do not have to be defined before indexing data: An indexed field is a field that is added to the metadata of an event at index time. There are legitimate reasons to define indexed fields, but in the vast majority of cases, it is unnecessary and is actually wasteful. We will discuss this in Chapter 3, Tables, Charts, and Fields.