Repurposing Logs

Published: 2015-03-24. Last Updated: 2015-03-24 16:11:42 UTC
by Kevin Liston (Version: 1)
3 comment(s)

Keeping an eye on your logs is critical (really, it's number 14 on the SANS critical list of controls: https://www.sans.org/critical-security-controls/control/14 .)  Earlier Rob VandenBrink shared some techniques to find nuggets hiding in your logs (https://isc.sans.edu/forums/diary/Syslog+Skeet+Shooting+Targetting+Real+Problems+in+Event+Logs/19449/ .)  Today I'm going to share some tricks to squeeze every last bit out of your logs through repurposing logs.  I mean repurposing log files, not this: https://www.pinterest.com/dawnreneedavis/repurposed-logs/ .

Logs are given their original purpose when programs determine when and how they're going to record a log entry.  Today I want to discuss "unintended value," or how to get more out of your logs than the programmers intended, or how to recover value that is easily overlooked.  

Let's start with an example.  Suppose you work in a large siloed environment and you don't have access to the logs from every group.  You're in a security or investigative function, and have access to the AV logs.  The obvious use of the logs is to record the alerts generated by the endpoints, or find machines that aren't updating signatures properly or are have detection engines that are out of date.  A bit that you might be overlooking is the value of the checkin message itself.  I've found it very useful to keep the checkins for a long period of time, which gives you a history of what IP and what user was logged into a machine when it regularly checks in.  It doesn't have the resolution and accuracy that you would get from you AD authentication logs, or your DHCP logs, but you might not have easy access to those.  This small investment in disk space, or simple database can give you quick snapshot views of machine and user mobility.  You can easily see if this desktop consistently has this IP, or if this laptop moves around through your campus.  You can get the same feel out of your user accounts too, without having to invasively dig through badge access logs.

This is the first technique that I want to share: extract a daily event out of your logs and store it over time.  This creates an additional product that keeping a rolling history of logs can't provide.  

Now consider what hidden and unexpected information might be hiding in your web proxy logs.  Take a look at the W3C standard fields.  If you reduce the displayed fields down to just timestamp, c-ip, r-host, and r-ip, you've got yourself a quick passive-DNS feed.  Granted it's just looking at web traffic, but a good chunk of your network mischief is traveling through that channel at least once.

Trick number two: look for unexpectedly-useful combinations of columns in your log entries.

On to number three: data reduction and indexing.  Logs are big, and logs are noisy.  While I recommend that you keep the raw logs for as long as you can, I understand that isn't possible and that you have to make tough choices on what you store and for how long.  One way to squeeze out more time from your logs is to reduce the number of columns that you keep for your archives.  Using the web proxy logs as an example, you might not be able to keep every log entry for 24 months, but keeping just the c-ip,r-host,r-ip columns can be very helpful when you're looking back through an old undiscovered compromise or are dealing with an information request like "has any system on your network interacted with one of these IPs?"

Years ago I would recommend further daily reduction and indexing of these files, but these days you probably have a splunk instance or an ELK stack (https://digital-forensics.sans.org/summit-archives/dfirprague14/Finding_the_Needle_in_the_Haystack_with_FLK_Christophe_Vandeplas.pdf) and you just dump logs in there and hope that magic happens.  There's value in examining and repurposing logs in these days of map reduce.  The reduced files that you create from the logs are easy to drop into your hadoop cluster and build a hive table out of.

 

So, let's tie this all together.  You've received your list of IPs from your intelligence vendor and you're tasked with finding any activity on your network over the past 2 years.  In your web proxy index you see that you had a hit 8 months ago.  Now you've got an IP and and date, what machine had that IP then?  Now you search through your AV checkin data and get machine name.  But the AV checkin logs are daily, not logged by minute, so you search around for the IP history of that machine in the AV logs and hopefully you see it consistently checking in from that IP and not moving around a lot.  If you're not so lucky, well, it's time to open up request tickets to hopefully get at the DHCP logs from back then.

One last parting thought: do you have waste/useless logs?  If you apply one or more of these techniques to it, can you find a way to process them into something useful?

-KL

 

 

 

 

 

Keywords:
3 comment(s)

Comments

Thank you for sharing these good ideas.
I perfectly get the operational benefit but, on the other hand, please keep in mind that some countries require to store legacy unedited logs for legal reasons and that custom, truncated or edited logs may not be compliant to those countries laws.
A good point, and there's been much internal debate on if feeding the logs into something like splunk or elastic search qualifies as unedited or not. In these cases, I recommend sending raw copies off to your compliance department to handle properly. Let the security folks focus on security.

-KL
[quote=comment#33769]A good point, and there's been much internal debate on if feeding the logs into something like splunk or elastic search qualifies as unedited or not. [/quote]

And there's no reason you can't do both. We're currently keeping gzip'd, raw logs for anywhere from 90 days to a few years (depending on the log type). But we also sent the raw log data to be indexed and databased so we can quickly search the last 30 days worth of data.

And as Kevin mentioned, there's real value in picking apart the logs and finding ways to reduce that data to more usable forms. For instance, I have a java program I banged together to sift postfix/sendmail logs from multiple servers and save it in a database in such a fashion that junior admins can easily find an email, then trace every hop, how it got from point A to point B, who received it when, etc. And I've got a perl tool I made to sift VPN/DNS/DHCP logs to track who's on which machines on what IPs at what times.

Diary Archives