SIEM use cases development workflow – Agile all the things!

If you are into Splunk rules development, I am pretty sure this post will relate to you. But before entering the main topic, let me quickly define what a SIEM use case is about, which is another trendy, hot topic in the Infosec industry today.

What is a SIEM use case after all?

For answering this question, I will simply promote one slide of a presentation I use in my workshops on Splunk Rules development:

1

In the end of the day, you may have the best, 100% available architecture, ingesting all sorts of logs; but if the platform does not provide value, you fail. That’s all.

And don’t fool yourself, Compliance/Regulation or Forensics use cases are out of this scope. If your use case is storing logs for later usage, it’s better revising the plan.

You likely don’t need Splunk to simply store data and eventually search over it. For that, you rely on on log management solutions which are not only cheaper, but easier to use and ultimately, avoiding wrong expectations (threat detection capability).

OK, now into Agile!

Despite being a bit reluctant in the beginning, the path to adopt the Agile methodology for developing security use cases with Splunk came in naturally.

It comes as no surprise for those who already treat Splunk queries as code, and it is particularly applicable for customers who want to embrace custom development.

In this article, I am going to highlight some of the benefits around that and also propose a workflow for the ones willing to give it a try.

I want it all! #NOT

Just in case you’ve landed here from another planet, here’s a quick summary on “Agile software development” straight from Wikipedia:

It’s a set of principles for software development under which requirements and solutions evolve through the collaborative effort […] It advocates adaptive planning, evolutionary development, early delivery, and continuous improvement.

I’m not here to prove Agile is -the- solution for you nor am I saying you should become a Scrum master or anything like that. But I encourage you to get familiar with the concepts, to go deeper on a particular topic that interests you, try and experiment.

There’s absolutely no need to blindly follow or enforce anything, but to leverage what best suits your development practice.

Nevertheless, some still see that as another heavy process to bring in or something that will put more overhead on developers – which is not true. Agile processes are designed to work and evolve over time, getting tighter and faster.

Here are some of the benefits I’ve noticed over time after employing the Agile approach in my line of work:

  • Transparency, Transparency, Transparency. Some ideas are super cool and sometimes they seem pretty easy as well. Turns out you still need time and resources to make it happen. Following a methodology allows those otherwise blurry requirements to emerge. That’s essential for better planning.
  • The transparency gives you and your team the ability to better handle expectations from stakeholders and management. For instance, to deliver a certain use case, you first need the right data on-boarded. To deliver more rules per development cycle, you need to enable more coders.
  • Easier prioritization. When there’s something actively blocking progress, it’s easy to link blockers/goals together and quickly evaluate the ‘cost’ of an unresolved issue, rapidly realizing the missing value of not tackling certain issues first.
  • Visibility and versatility. The concept of ‘Sprints’ provides the highest impact giving your capabilities (engineering). It’s easier to increase the pace and throughput as well a adapt, once it’s done in small, incremental targets.
  • Better collaboration with easier project tracking. That’s especially needed when working with ‘virtual’ teams located in different timezones. Instead of a big team working together, one or two members working on small tasks.

The list goes on and on but those should give you a hint on what’s possible to achieve.

The workflow

I would call this a draft as you will need to adjust to your own practice or organization giving that many of those boxes can be broke down into multiple sub-processes.

It’s more applicable to rules development (correlation searches) but may be easily adapted for managing more elaborated, long-term use cases.

It’s made with Draw.io which works pretty well (contact me for the XML/VSD version).

In case you are interested in suggestions for ranking or scoring your use case ideas, please refer to the following blog post I wrote on Medium:

Security Analytics: How to rank use cases based on the “Quick Wins” approach?

Feel free to reach out in case you have comments/feedback.

It’s about time to change your correlation searches timing settings

Too late - conceptual alarm clock showing that you are too lateI wrote about the problem of delayed events in a previous post, so here the focus is on how to overcome that problem when writing a rule or a correlation search (CS).

What’s the problem?

Most if not all App/TA developers extract _time from the log generation time. And that’s the best practice since we all want to track the log generation time, usually set by the device or system generating the log.

If the extraction (regex based) goes wrong for whatever reason, basically, _time is set to _indextime. And that may lead to many other issues which are out of the scope here.

The thing is Splunk’s default behavior is to use _time across the entire system, from the Time Picker to scheduled searches and rules.

When a rule is executed using relative time (modifiers), the time reference is the rule engine’s clock, that means, the search head or the Splunk instance where the Enterprise Security (ES) App is installed.

A few risks introduced here, in a threat detection context – if you rely on properly extracted _time as the time reference for your searches or rules:

  1. In case there’s a delay or too much latency between the collection (UF) and the indexing of an event, the time window checked from your CS may have been scanned already, hence the event will never be considered. More details here;
  2. In case _time is extracted with a wrong value, there’s simply no integrity in the whole process. And here just a few scenarios when this may happen:
    1. Wrong clock set on the originating device or system;
    2. Wrong timezone settings;
    3. Wrong regex (lack of precision, picking the wrong epoch from the log, etc);
    4. Attacker changing or tampering with the system clock (Eventcode 4616).

Those are particularly valid when applied to “near real time” based rules or the ones running with a more aggressive interval (ex.: every minute).

Why is that important?

Most customers and users are NOT aware of such risks. And I can confirm that all customers I’ve visit so far, with no exception, were not taking this into account.

Basically, that means there’s a gap in detection coverage.

How to overcome or mitigate that?

Even though there’s no way to tell Splunk to ignore _time during searches (it’s always part of the scope/boundary), you can change this behavior by using index time as your time reference or relative time within a query.

The index time is stored as an internal field called _indextime. And the way to use it from your searches is quite simple:

  • Use index time as the time boundaries for your search. That means using _index_earliest and _index_latest within your CS code;
  • Set the standard time (_time) boundaries (earliest and latest) to a bigger window, at least bigger than the index time boundaries.

More details on time modifiers for your search can be found here.

How does it look in practice?

Below you can find a sample correlation search that leverages this approach. It also provides a dynamic drill down search query based exactly on the time boundaries used during the rule’s execution time.

Just assume you are stacking multiple critical NIDS signatures per target host every 5 minutes (interval) to raise an alert (notable event).

index=foo sourcetype=bar severity=1 _index_earliest=-5min@min
| stats min(_indextime) AS imin,
  max(_indextime) AS imax,
  values(signature) AS signature
  BY host
| eval dd="index=foo sourcetype=bar severity=1 host=".host
| eval dd=dd." _indextime>=".imin." _indextime<=".imax

Time settings

Earliest: -5h@h
Latest: +5h@h
Cron schedule (interval): */5 * * * *

Set your drill down search to search $dd$ and voila! (_time boundaries are automatically inherited via $info_min_time$ and $info_max_time$ tokens).

That would consider any matched event indexed within the last 5 minutes, allowing the event _time to be 5 hours off or “skewed” (positive/negative), as compared to the rule engine’s clock (search head).

Also, note the time boundaries are set by using _indextime instead of search modifiers_index_earliest and _index_latest. Reason for that is because the latter is not inclusive, meaning events having the latest time within the boundaries will not match.

Once you are OK with that approach, consider using tags/eventtypes/macros to optimize and build cleaner code.

What about performance?

And before you ask. No. There’s no noticeable impact in performance since the search engine will detect the narrowed index time settings and will reduce the search scope, despite the bigger window set from the regular time boundaries (-5h, +5h).

Log in to your test environment, try setting the Time Picker to “All Time” (_time boundaries) and running the following search if you want to check by yourself:

index=_* _index_earliest=-2s@s | stats count

That search query counts the number of events indexed within the last 2 seconds regardless of their _time values. It should be fast despite “All Time”.

In case you want to go deeper on _time x _indextime behavior in your environment, this post introduces a tstats based dashboard for tracking that.

Feel free to reach out in case you have comments/feedback and happy Splunking!

SIEM tricks: dealing with delayed events in Splunk

tiSo after bugging the entire IT department and interrogating as many business teams as possible to grant you (the security guy) access to their data, you are finally in the process of developing your dreamed use cases. Lucky you!

Most SIEM projects already fall apart before reaching that stage. Please take the time to read a nicely written article by SIEM GM Anton Chuvakin. In case you don’t have the time, just make sure you check the section on “Mired in Data Collection”.

The process of conceptualizing, developing, deploying and testing use cases is challenging and should be continuous. There are so many things to cover, I bet you can always find out something is missing while reading yet another “X ways to screw up a SIEM” article.

So here’s another idea to prove it once again: how can you make sure the events are safely arriving at your DB or index? Or even beyond: how can you make sure the timestamps are being parsed or extracted appropriately? Why is it important?

Time is what keeps everything from happening at once.

First of all, I’m assuming the Splunk terminology here, so it’s easier to explain by example. Also, let’s make two definitions very clear:

Extracted time:  corresponding to the log generation time, coming from the log event itself. This one is usually stored as _time field in Splunk.

Index time: corresponding to the event indexing time, generated by Splunk indexer itself upon receiving an event. This one is stored as _indextime field in Splunk.

There are infinite reasons why you should make sure timestamps are properly handled during extraction or searching time, but here are just a few examples:

  1. Timezones: this piece of data is not always part of the logs. So time values from different locations may differ – a lot;
  2. Realtime/Batch processing: not all logs are easily collected near realtime. Sometimes they are collected in hourly or daily chunks;
  3. Correlation Searches (Rules) and forensic investigations are pretty much relying on the Extracted Time. Mainly because that’s the default behavior, either from the Time Picker (Search GUI) or the Rule editor.

Have you noticed the risk here?

Going under the radar

In case you haven’t figured out yet, apart from all other effects of not getting events’ time right, there’s a clear risk when it comes to security monitoring (alerting): delayed events may go unnoticed.

If you are another “realtime freak”, running Correlation Searches every 5 minutes, you are even more prone to this situation. Imagine the following: you deploy a rule (R1) that runs every 5 minutes, checking for a particular scenario (S1) within the last 5 minutes, and firing an alert whenever S1 is found.

For testing R1, you intentionally run a procedure or a set of commands that trigger the occurrence of S1. All fine, an alert is generated as expected.

Since correlation searches and, in fact, any search, scheduled or not, runs based on Extracted Time (_time) by default, supposing that S1 events are delayed by 5 minutes, those events will never trigger an alert from R1. Why?

Because the 5-minute window checked by the continuous, scheduled R1 will never re-scan the events from a previous, already checked window. The moment those delayed events are known to exist (indexed), R1 is already set to check another time window, therefore, missing the opportunity to detect S1 behavior from delayed events.

What can be done?

There are many ways to tackle this issue but regardless of which one is chosen, you should make sure the _time field is extracted correctly – it doesn’t matter if the event arrives later or not.

Clock skew monitoring dashboard

The clock skew problem here applies to the difference between Indexed (_indextime) and Extracted (_time) values. Assuming near realtime data collection, those values tend to be very close, which implies it’s completely fine to have them out of sync.

Folks at Telenor CERT were kind enough to allow me to share a slightly simplified version of a dashboard we’ve written that is used to monitor for this kind of issue, we call it “Event Flow Tracker”.

The code is available at Github and is basically a SimpleXML view, based on default fields (metadata). It should render well once it’s deployed to any search head.

Here’s a screenshot:

eft1

Since searches rely on metadata (tstats based), it runs pretty fast, and also tracks the event count (volume) and reporting agents (hosts) over time. Indexes are auto-discovered from a REST endpoint call, but the dashboard can also be extended or customized for specific indexes or source types.

When clicking at “Show charts” link under Violations highlighted in red, the following line charts are displayed:

eft2

So assuming a threshold of one hour (positive/negative), with the visualizations it’s easier to spot scenarios when those time fields are too different from each other.

The first chart shows how many events are actually under/above the threshold. The second chart depicts how many seconds those events are off in average.

How to read the charts?

Basically, assuming median as the key metric, in case the blue line (median) is kept steady above the green line (threshold), it might be related to a recurring, constant issue that should be investigated.

Since the dashboard is based on regular queries, those can be turned into alerts in case you want to systematically investigate specific scenarios. For example, for events that must follow strict time settings.

The dashboard is not yet using the base search feature, so perhaps it’s something you could consider in case you want to use or improve it.

Writing Rules – Best practices

Now, assuming the risk is known, that is, some events may land on the indexers a bit later due to a transport bottleneck (network, processing queue, etc), how to write reliable rules?

Delayed detection?

If data is not there yet, how can you reliably detect anything? This is an obvious decision. You should always consider capturing as much signal as you can in order to trigger a high-quality alert.

If you are into “realtime detection”, I suggest you consider checking how many events you might have missed due to this problem (delayed events). I’m more into detecting something with accuracy, even if a bit delayed, rather than trying to detect something almost immediately risking less accuracy or even the lack of alerting.

Also, depending on your search query (density, constraints, etc), you may gain some extra resource power by increasing the interval and time boundaries from your rules.

As a side note: reports say organizations take days if not months to detect a breach, but some insist on realtime detection. Is that what Mr. Trump tried to convey here?

Time boundaries based on Index Time?

Yes, that’s also an option. You can search based on _indextime. So basically, as soon as the event is indexed, no matter how off the Extracted time (_time) is, it may be consider for an alert.

The downside of it, besides adding more complexity when troubleshooting Throttling/Suppression, is that you need to carefully review all your drilldown searches from another perspective, taking _indextime into account. In other words, the searches should always specify _index_earliest and _index_latest. More info here.

References

Event indexing delay
http://docs.splunk.com/Documentation/Splunk/6.5.1/Troubleshooting/Troubleshootingeventsindexingdelay

Splunk/ES: dynamic drilldown searches

72345577One of the advantages of Splunk is the possibility to customize pretty much anything in terms of UI/Workflow. Below is one example on how to make dynamic drilldown searches based on the output of aggregated results (post-stats).

Even though Enterprise Security (ES) comes with built-in correlation searches (rules), some mature/eager users leverage Splunk’s development appeal and write their own rules based on their use cases and ideas, especially if they are already familiar with SPL.

Likewise, customizing “drilldown searches” is also possible, enabling users to define their own triage workflows, facilitating investigation of notable events (alerts).

Workflow 101: Search > Analytics > Drilldown

Perhaps the simplest way to define a workflow in ES is by generating alerts grouped by victim or host and later being able to quickly evaluate all the details, down to the RAW events related to a particular target scenario.

As expected, there are many ways to define a workflow, here’s a short summary of the stages listed above:

Search: here you define your base search, applying as many filters as possible so that only relevant data is processed down the pipe. Depending on how dense/rare your search is, enrichment and joins can also be done here.

Analytics: at this stage you should get the most out of stats() command. By using it you systematically aggregate and summarize the search results, which is something desirable given that every row returned will turn into a new notable event.

Drilldown: upon generating a notable event, the user should be able to quickly get to the RAW events building up the alert, enabling rapid assessment without exposing too many details for analysis right from the alert itself.

You may also want to craft a landing page (dashboard) from your drilldown search string, enabling advanced workflows such as Search > Analytics > Custom Dashboard (Dataviz, Enrichment) > RAW Events > Escalation (Case Management).

Example: McAfee ePO critical/high events

Taking McAfee’s endpoint security solution as an example (fictitious data, use case), here’s how a simple workflow would be built based on a custom correlation search that looks for high-severity ePO events.

First, the base search:

index=main sourcetype=mcafee:epo (severity=critical OR severity=high)

Next, using stats command to aggregate and summarize data, grouping by host:

| stats values(event_description) AS desc, values(signature) AS signature, values(file_name) AS file_path, count AS result BY dest

The above command is also performing some (quick) normalization to allow proper visualization within ES’ Incident Review dashboard, and also providing some quick statistics to facilitate the alert evaluation (event count, unique file names, etc).

Finally, it’s time for defining the dynamic drilldown search string based on the output of those two commands (search + stats):

| eval dd="index=main sourcetype=mcafee:epo (severity=critical OR severity=high) dest=".dest

Basically, the eval command is creating a new field/column named “dd” to store the exact search query needed to search for ePO events for a given host (dest).

In the end, putting it all together:

out1

Despite having more than 150 matching events (result) from each of those hosts, the maximum number of alerts that can be possibly generated over each correlation search execution is limited to the number of unique hosts affected.

And here’s how that translates into a correlation search definition:

cs1

cs2

Note that the “Drill-down search” value is based on a token expansion: search $dd$. This way, the value of “dd” is used to dynamically build the drilldown link.

Now, once the correlation search generates an alert, a link called “Search for raw events” should become available under “Contributing Events” after expanding the notable event details at the Incident Review dashboard.

By clicking the link, the user is directed to a new search containing all raw events for the specific host, within the same time window used by the correlation search:

cs4

Defining a “dd” field within your code is not only enabling custom dashboards development with easy access to the drilldown search (index=notable) but also standardizing the value for the drilldown search at the correlation search definition.

As always, the same drilldown search may be triggered via a Workflow Actions. Feel free to get in touch in case you are interested in this approach as well.

Happy Splunking!

 

Honing in on the Homeless – the Splunkish way

e9038252f910c840e582818a63dd9908_400x400Have you noticed Splunk just released a new version, including new data visualizations? I had been eager to start playing with one of the new charts when yesterday I came across a blog post by Bob Rudis, who is co-author of the Data-Driven Security Book and former member of the Verizon’s DBIR team.

In that post, @hrbrmstr is presenting readers with a dataviz challenge based on data from U.S. Department of Housing and Urban Development (HUD) related to homeless population estimates. So I’ve decided to give it a go with Splunk.

Even though -we can’t compare- the power of R and other Stats/Dataviz focused programming languages with current Splunk programming language (SPL), this exercise may serve to demonstrate some of the capabilities of Splunk Enterprise.

Sidenote: In case you are into Machine Learning (ML) and Splunk, it’s also worth checking the new ML stuff just released along with Splunk 6.4, including the awesome ML Toolkit showcase app.

The challenge is basically about asking insightful, relevant questions to the HUD data sets and generating visualizations that would help answering those questions.

What the data sets can tell about the homeless population issue?

The following are the questions I try to answer,  considering the one proposed in the challenge post: Which “states” have the worst problem in terms of homeless people?

  1. Which states currently have the largest homeless population per capita?
  2. Which states currently have the largest absolute homeless population?
  3. Which states are being successful or failing on lowering the figures compared to previous years?

I am far from considering myself a data scientist (was looking up standard deviation formula the other day), but love playing with data like many other Infosec folks in our community. So please take it easy with newbies!

Since we are dealing with data points representing estimates and this is a sort of experiment/lab, take them with a grain of salt and consider adding “according to the data sets…here’s what that Splunk guy verified” to the statements found here.

Which states currently have the largest homeless population per capita?

For this one, it’s pretty straightforward to go with a Column chart for quick results. Another approach would be to gather map data and work on a Choropleth chart.

Basically, after calculating the normalized values (homeless/100k population), I filter in only the US states making the top of the list, limiting to 10 values . They are then sorted by values from year 2015 and displayed on the chart below:

homeless-ratio

Homeless per 100k of population – Top 10 US states

The District of Columbia clearly stands out, followed by Hawaii and New York. That’s one  I would never guess. But there seems to be some explanation for it.

Which states currently have the largest absolute homeless population?

In this case, only the homeless figures are considered for extracting the top 10 states. Below are the US states where most homeless population lives based on latest numbers (2015), click to enlarge.

homeless-abs

Homeless by absolute values – Top 10 US states

As many would guess, New York and California are leading here. Those two states along with Florida and Texas are clearly making the top of the list since 2007.

Which states are being successful or failing on lowering the figures compared to previous years?

Here we make use of a new visualization called Horizon chart. In case you are not familiar with this one, I encourage you to check this link where everything you need to know about it is carefully explained.

Basically, it eases the challenge of visualizing multiple (time) series with less space (height) by using layered bands with different color codes to represent relative positive/negative values, and different color shades (intensity) to represent the actual measured values (data points).

After crafting the SPL query, here’s the result (3 bands, smoothed edges) for all 50 states plus DC, present in the data sets:

horizon-chart

So how to read this visualization? Keep in mind the chart is based on the same prepared data used in the first chart (homeless/100k population).

The red color means the data point is higher when compared to the previous measurement (more homeless/capita), whereas the blue represents a negative difference when comparing current and last measurements (less homeless/capita). This way, the chart also conveys trending, possibly uncovering the change in direction over time.

The more intense the color is, the higher the (absolute) value. You can also picture it as a stacked area chart without needing extra height for rendering.

The numbers listed at the right hand side represent the difference between immediate data points point in the timeline (current/previous). For instance, last year’s ratio (2015) for Washington decreased by ~96 as compared to the previous year (2014).

On a Splunk dashboard or from the search query interface (Web GUI), there’s also an interactive line that displays the relative values as the user hovers over a point in the timeline, which is really handy (seen below).

horizon_crop

The original data files are provided below and also referenced from the challenge’s blog and GitHub pages. I used a xlsx2csv one-liner before handling the data at Splunk (many other ways to do it though).

HUD’s homeless population figures (per State)
US Population (per State)

The Splunk query used to generate the data used as input for the Horizon chart is listed below. It seems a bit hacky, but does the job well without too much effort.

| inputlookup 2007-2015-PIT-Counts-by-State.csv
| streamstats last(eval(case(match(Total_Homeless, "Total"), Total_Homeless))) as _time_Homeless
| where NOT State_Homeless="State"
| rex mode=sed field=_time_Homeless "s|(^[^\d]+)(\d+)|\2-01-01|"
| rename *_Homeless AS *
| join max=0 type=inner _time State [
  | inputlookup uspop.csv
  | table iso_3166_2 name
  | map maxsearches=51 search="
    | inputlookup uspop.csv WHERE iso_3166_2=\"$iso_3166_2$\"
    | table X*
    | transpose column_name=\"_time\"
    | rename \"row 1\" AS \"Population\"
    | eval State=\"$iso_3166_2$\"
    | eval Name=\"$name$\"
  "
  | rex mode=sed field=_time "s|(^[^\d]+)(\d+)|\2-01-01|"
]
| eval _time=strptime(_time, "%Y-%m-%d&amp")
| eval ratio=round((100000*Total)/Population)
| chart useother=f limit=51 values(ratio) AS ratio over _time by Name

Want to check out more of those write-ups? I did one in Portuguese related to Brazil’s Federal Budget application (also based on Splunk charts). Perhaps I will update this one soon with new charts and a short English version.

My TOP 5 Security (and techie) talks from Splunk .conf 2015

indexIf you are into Security and didn’t have an opportunity to attend the Splunk conference in Las Vegas this year (maybe you’re busy playing Blackjack instead?), here’s what you can not miss.

The list is not sorted in any particular order and, whenever possible, entries include presenters’ Twitter handles as well as takeaways or comments that might help you choose where to start.

  1. Security Operations Use Cases at Bechtel (recording / slides)
    That’s the coolest customer talk from the ones I could watch. The presenters (@ltawfall / @rj_chap) discussed some interesting use cases and provided a lot of input for those willing to make Splunk their nerve center for security.
  2. Finding Advanced Attacks and Malware with Only 6 Windows EventIDs (recording / slides)
    This presentation is a must for those willing to monitor Windows events either via native or 3rd party endpoint solutions. @HackerHurricane really knows his stuff, which is not a surprise for someone calling himself a Malware Archaeologist.
  3. Hunting the Known Unknowns (with DNS) (recording / slides)
    If you are looking for concrete security use case ideas to build based on DNS data, that’s a gold. Don’t forget to provide feedback to Ryan Kovar and Steve Brant, I’m sure they will like it.
  4. Building a Cyber Security Program with Splunk App for Enterprise Security (recording / slides)
    Enterprise Security (ES) app relies heavily on accelerated data models, so besides interesting tips on how to leverage ES, Jeff Campbell provides ways to optimize your setup, showing what goes under the hood.
  5. Build A Sample App to Streamline Security Operations – And Put It to Use Immediately (recording)
    This talk was delivered by Splunkers @dimitrimckay and @daveherrald. They presented an example on how to build custom content on top of ES to enhance the context around an asset, which is packed to an app available at GitHub.

Now, in case you are not into Security but also enjoy watching hardcore, techie talks, here’s my TOP 5 list:

  1. Optimizing Splunk Knowledge Objects – A Tale of Unintended Consequences (recording / slides)
    Martin gives an a-w-e-s-o-m-e presentation on Knowledge Objects, unraveling what happens under the hood when using tags and eventtypes. Want to provide him feedback? Martin is often found at IRC, join #splunk and say ‘Hi’!
  2. Machine Learning and Analytics in Splunk (recording / slides)
    If you are into ML and the likes of R programming, the app presented here will definitely catch your attention. Just have a quick look on the slides to see what I mean. A lot of use cases for Security here as well.
  3. Beyond the Lookup Glass: Stepping Beyond Basic Lookups (recording)
    Wanna know about the challenges with CSV Lookups and KV store in big deployments? Stop here. Kudos to Duane Waddle and @georgestarcher!
  4. Splunk Search Pro Tips (recording / slides)
    Just do the following: browse the video recording and skip to around 30′ (magic!). Now, try not watching the entire presentation and thank Dan Aiello.
  5. Building Your App on an Accelerated Data Model (recording / slides)
    In this presentation, the creator of the ubberAgent@HelgeKlein – describes how to make the most of data models in great detail.

Still eager for more security related Splunk .conf stuff? Simply pick one below (recordings only).

For all presentations (recordings and slides), please visit the conference website.

Splunk > Self-Learning Path & The Community Factor

Splunk is gaining tremendous traction in the market due to its ability to harness the value of machine data. The idea here is to highlight a few reasons for such success: free-access and community driven approaches.

Being familiar with the ways in which knowledge can be freely attained is a great advantage. Coupled with your curiosity, pretty much nothing more is needed to become an independent learner these days.

Below you will find the main references I’ve been using to learn Splunk and get up to speed with this great technology.

Splunk Platform: Free, Easy Access

Splunk provides free access to its flagship product, Splunk Enterprise. Users evaluating the product can also get a free, perpetual license. That means no initial costs for installing and evaluating most of its primary capabilities.

For developers, there is also a developer license which enables up to 10GB a day for data indexing.

TLDR? Just hit Play!

Besides the excellent Just Ask campaign, the following short videos help showing Splunk’s benefits:

Are you looking for more technical stuff, easy to follow and digest? Below is a YouTube playlist with demo-like lessons available from Splunk’s channel:

Besides, if you are an Infosec pro, don’t forget to check the current Security related apps at the portal. Aside from that, below you will find a few videos that might trigger inspiration for further research and ideas:

Q&A Forum, IRC and Wiki

The Splunk Answers forum is really an important knowledge base, and here’s why:

  • The discussions are around questions and answers, so entries tend to be clear and narrowed to a specific topic, often times matching an issue you are currently facing;
  • Not only Splunk team members provide answers. It’s common to get responses from partners and, of course, the whole Splunk community, including end-users;
  • Script/Code as well as images are allowed for easier understanding of a question or an answer. Top contributors are also awarded with points and badges to promote users interaction;
  • There is a sort of rating to answers, so users can also rely on that for choosing where to start.

I was also surprised when I joined the IRC channel as several Splunk staff members (PS, Devel, Support) take part in the discussions there. Sometimes the answer not found via documentation, or a bug report might well be the subject of a quick chat.

Besides that, there is, of course, a Splunk Wiki! As it applies to other examples listed here, it’s also community driven so anyone is able to add and edit content.

Documentation Portal

Splunk provides a well organized documentation portal, which serves as a quick reference guide (e.g., search commands) and also enables you to learn about more advanced topics such as Distributed Deployment, or the Common Information Model Add-on Manual.

Also, there are some dedicated tutorials available such as the Search Tutorial. I am listing below some doc bookmarks that I am constantly querying on:

It’s worth noting most areas from the documentation portal are provided with a Comments section, from which the answer for your issue might be found, so always keep an eye on that.

UPDATE 9-Mar-15: Also, don’t forget to bookmark Splexicon, a documentation reference that defines technical terms that are specific to Splunk. Definitions include links to related information from the Splunk documentation.

Cheatsheets

For those Splunk Ninjas pros out there who love having those neat docs around, there are some cool versions available for Splunk as well. Some of them are listed below:

The Community Factor: BIG Win!

The community engagement is a huge win in respect to knowledge sharing and as a business strength. Simply setting up a web forum doesn’t enable community integration. In my opinion, here are some of the great initiatives Splunk has been carrying out to accomplish that:

Missing something? Just let me know so I can add them here as well.