It’s about time to change your correlation searches timing settings

Too late - conceptual alarm clock showing that you are too lateI wrote about the problem of delayed events in a previous post, so here the focus is on how to overcome that problem when writing a rule or a correlation search (CS).

What’s the problem?

Most if not all App/TA developers extract _time from the log generation time. And that’s the best practice since we all want to track the log generation time, usually set by the device or system generating the log.

If the extraction (regex based) goes wrong for whatever reason, basically, _time is set to _indextime. And that may lead to many other issues which are out of the scope here.

The thing is Splunk’s default behavior is to use _time across the entire system, from the Time Picker to scheduled searches and rules.

When a rule is executed using relative time (modifiers), the time reference is the rule engine’s clock, that means, the search head or the Splunk instance where the Enterprise Security (ES) App is installed.

A few risks introduced here, in a threat detection context – if you rely on properly extracted _time as the time reference for your searches or rules:

  1. In case there’s a delay or too much latency between the collection (UF) and the indexing of an event, the time window checked from your CS may have been scanned already, hence the event will never be considered. More details here;
  2. In case _time is extracted with a wrong value, there’s simply no integrity in the whole process. And here just a few scenarios when this may happen:
    1. Wrong clock set on the originating device or system;
    2. Wrong timezone settings;
    3. Wrong regex (lack of precision, picking the wrong epoch from the log, etc);
    4. Attacker changing or tampering with the system clock (Eventcode 4616).

Those are particularly valid when applied to “near real time” based rules or the ones running with a more aggressive interval (ex.: every minute).

Why is that important?

Most customers and users are NOT aware of such risks. And I can confirm that all customers I’ve visit so far, with no exception, were not taking this into account.

Basically, that means there’s a gap in detection coverage.

How to overcome or mitigate that?

Even though there’s no way to tell Splunk to ignore _time during searches (it’s always part of the scope/boundary), you can change this behavior by using index time as your time reference or relative time within a query.

The index time is stored as an internal field called _indextime. And the way to use it from your searches is quite simple:

  • Use index time as the time boundaries for your search. That means using _index_earliest and _index_latest within your CS code;
  • Set the standard time (_time) boundaries (earliest and latest) to a bigger window, at least bigger than the index time boundaries.

More details on time modifiers for your search can be found here.

How does it look in practice?

Below you can find a sample correlation search that leverages this approach. It also provides a dynamic drill down search query based exactly on the time boundaries used during the rule’s execution time.

Just assume you are stacking multiple critical NIDS signatures per target host every 5 minutes (interval) to raise an alert (notable event).

index=foo sourcetype=bar severity=1 _index_earliest=-5min@min
| stats min(_indextime) AS imin,
  max(_indextime) AS imax,
  values(signature) AS signature
  BY host
| eval dd="index=foo sourcetype=bar severity=1 host=".host
| eval dd=dd." _indextime>=".imin." _indextime<=".imax

Time settings

Earliest: -5h@h
Latest: +5h@h
Cron schedule (interval): */5 * * * *

Set your drill down search to search $dd$ and voila! (_time boundaries are automatically inherited via $info_min_time$ and $info_max_time$ tokens).

That would consider any matched event indexed within the last 5 minutes, allowing the event _time to be 5 hours off or “skewed” (positive/negative), as compared to the rule engine’s clock (search head).

Also, note the time boundaries are set by using _indextime instead of search modifiers_index_earliest and _index_latest. Reason for that is because the latter is not inclusive, meaning events having the latest time within the boundaries will not match.

Once you are OK with that approach, consider using tags/eventtypes/macros to optimize and build cleaner code.

What about performance?

And before you ask. No. There’s no noticeable impact in performance since the search engine will detect the narrowed index time settings and will reduce the search scope, despite the bigger window set from the regular time boundaries (-5h, +5h).

Log in to your test environment, try setting the Time Picker to “All Time” (_time boundaries) and running the following search if you want to check by yourself:

index=_* _index_earliest=-2s@s | stats count

That search query counts the number of events indexed within the last 2 seconds regardless of their _time values. It should be fast despite “All Time”.

In case you want to go deeper on _time x _indextime behavior in your environment, this post introduces a tstats based dashboard for tracking that.

Feel free to reach out in case you have comments/feedback and happy Splunking!