Malicious RTF Files
About a year ago I received RTF samples that I could not analyze with RTFScan or rtfobj (FYI: Philippe Lagadec has improved rtfobj.py significantly since then). So I started to write my own RTF analysis tool (rtfdump), but I was not satisfied enough with the way I presented the analysis result to warrant a release of my tool. Last week, I started analyzing new samples and updating my tool. I released it, and show how I analyze sample 07884483f95ae891845caf0d50ce507f in this diary entry.
This sample is an heavily obfuscated RTF file. RTF files are essentially sets of nested strings that start with { and end with }. Like this (strongly simplified):
{\rtf {data {more data}}}.
Malicious RTF files contain a payload. Objects in RTF files are embedded in hexadecimal, like this (strongly simplified):
{\rtf {data
{\*\objdata
01050000
02000000
08000000
46696C656E616D6500000000000000...
}}}
Malicious RTF files obfuscate the hexadecimal data in many ways, one of them is to put extra control strings inside the hexadecimal data, like this:
{\rtf {data
{\*\objdata
01050000
02000000
08000000
46696C656E61{\obj}6D6500000000000000...
}}}
The sample I analyzed takes this to the extreme. After each hexadecimal digit, extra control strings and whitespace are inserted:
(I removed a lot of whitespace to be able to put several hexadecimal digits on the screen).
The hexadecimal digits (highlighted in red) are 01050…
My tool outputs a line of analysis data for each nested string. In this sample, because of the obfuscation, there are a lot of them (22956, which is gigantic for an RTF file).
But you can reduce the output by filtering for entries that (potentially) contain an embedded object using option -f O:
Entry 165 is the one we will take a closer look at first. The information presented for entry 165 is the following: the nesting level is 4, it has 1 child (c=), starts at position 2ae5 in the file (p=), is 1194952 bytes long (l=), has 11429 hexadecimal digits (h=), has no \bin entries (b=), contains an embedded object (O), has 1 unknown character (u=) and is named \*\objdata133765.
We can select entry 165 for closer analysis:
I highlighted the hexademical digits in red.
To decode the hexadecimal data, we use option -H:
You can see the hex data clearly now: 01 05 00 ...
Since this is an embedded object, we use option -i to get more info on the object:
From the magic header, we see that the embedded object is an OLE file (FYI: if we analyze it with oledump, we get parsing errors).
Looking further into the data (-H), we see stream entries in the output:
And a bit further, we even find a URL:
Taking a closer look, I don't only see a URL, but hex data that looks like shellcode.
We can select this shellcode by cutting if out of the stream (option -c):
And of course also dump it to a file (option -d), so that we scan analyze it with the shellcode analyzer from libemu:
So this RTF file is a downloader.
The presence of shellcode in an RTF file is often an indication of an exploit. rtfdump supports YARA (like many of my *dump tools):
The first YARA search doesn't find anything. But the second search with option -H (to decode the hexadecimal content to binary) has hits for my RTF_ListView2_CLSID YARA rule. This indicates that entry 165 contains a byte sequence for the ListView2 classid, so this is very likely an exploit for vulnerability CVE-2012-0158 in this ListView.
The set of samples I looked at last week are characterized by the following properties:
they start with {\rtfMETAX
they end with this:
If you have interesting tools or techniques to analyze RTF files, please post a comment.
Didier Stevens
Microsoft MVP Consumer Security
blog.DidierStevens.com DidierStevensLabs.com
Comments