Getting the EXE out of the RTF
Recently, when the targeted attack with malicious RTF attachments was making the rounds, I wondered how to best get the embedded EXE extracted from the RTF for further analysis. On a Windows system, you would most likely simply copy/paste the embedded object from within RTF to an Explorer window, and end up with the original file. Since I do my malware analysis on Unix, this wasn't an option. Looking at the file, it appeared as if RTF was using some sort of hexadecimal encoding:
Now, as a command line Perl addict, hex is something I know how to deal with :-).
$cat detail.rtf | sed -e '1,3d' | perl -ne 's/(..)/print chr(hex($1))/ge' > detail.bin
$cat detail.bin | hexdump -C | more
00000000 02 00 00 00 08 00 00 00 50 61 63 6b 61 67 65 00 |........Package.|
00000010 00 00 00 00 00 00 00 00 1c e4 00 00 02 00 4d 69 |.........ä....Mi|
00000020 63 72 6f 73 6f 66 74 20 57 6f 72 64 20 20 45 6e |crosoft Word En|
00000030 64 4e 6f 74 65 20 78 32 20 65 72 72 6f 72 2e 20 |dNote x2 error. |
00000040 50 6c 65 61 73 65 20 64 6f 75 62 6c 65 20 63 6c |Please double cl|
00000050 69 63 6b 20 68 65 72 65 20 74 6f 20 76 69 65 77 |ick here to view|
00000060 20 74 68 65 20 6f 72 69 67 69 6e 61 6c 20 63 6f | the original co|
00000070 6e 74 65 6e 74 2e 73 63 72 00 43 3a 5c 55 73 65 |ntent.scr.C:Use|
Sweet, we get something printable! The “sed” command deletes the first three lines, because they don't contain hex and would confuse the Perl statement that follows. The Perl code eats up two digits at once and converts them to the corresponding ASCII character, iterating through the entire file. I'm using “perl -ne” combined with “print” instead of “perl -pe” because the former makes it easier to ignore the pesky CR/LF line end markers that make Windows text so annoying on Unix. The output gets piped into “hexdump -C”, because we expect this content to be an embedded EXE file, and thus it likely contains a lot of non-printable characters that would not be fun to look at in “vi” or “more”.
A bit further down in the output, there was indeed the tell tale “MZ” marker of the beginning of a MSDOS PE header.
00000170 6c 20 63 6f 6e 74 65 6e 74 2e 73 63 72 00 00 e0 |l content.scr..à|
00000180 00 00 4d 5a 50 00 02 00 00 00 04 00 0f 00 ff ff |..MZP.........ÿÿ|
00000190 00 00 b8 00 00 00 00 00 00 00 40 00 1a 00 00 00 |..¸.......@.....|
000001a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
Easy, I thought. Let's carve out the file beginning with the MZ and we should have the EXE:
$ dd if=detail.bin of=detail.exe bs=1 skip=386
61870+0 records in
61870+0 records out
61870 bytes (62 kB) copied, 0.269451 s, 230 kB/s
$ file detail.exe
detail.exe: PE32 executable for MS Windows (GUI) Intel 80386 32-bit
“if” and “of” are the input and output files of the “dd” command. “bs=1” sets the step size to one byte, and “skip”, well, skips the given number of bytes at the beginning of the file. 386 is the decimal equivalent of 0x182, the offset of MZ visible in the hexdump above.
While the “file” command confirmed that I had indeed carved out an executable, something was wrong – the file didn't want to run in the emulator, and when I uploaded it to threatexpert.com, their service called it “invalid”. I quickly figured out that the RTF has a lot of crud at the end as well, which also needs to be cut off, but I still couldn't reliably determine the correct length, and hence didn't know where the last byte of the embedded executable was.
Well, time for the malware reverse engineering equivalent of the “known plaintext attack”. I used a Windows PC to embed a copy of notepad.exe into an otherwise empty RTF document of my own, and then went about analyzing this RTF until I was able to carve out the original notepad.exe. The main “AHA!” moment was when I realized that the bytes between the filename and the “MZ” header actually are the length of the embedded file. If we use our hexdump from before
00000170 6c 20 63 6f 6e 74 65 6e 74 2e 73 63 72 00 00 e0 |l content.scr..à|
00000180 00 00 4d 5a 50 00 02 00 00 00 04 00 0f 00 ff ff |..MZP.........ÿÿ|
00000190 00 00 b8 00 00 00 00 00 00 00 40 00 1a 00 00 00 |..¸.......@.....|
000001a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
the length of the file in this case is 0x00E000, which is 57344 in decimal. Back to the sample:
$ dd if=detail.exe of=detail-fixed.exe bs=1 count=57344
57344+0 records in
57344+0 records out
57344 bytes (57 kB) copied, 0.268809 s, 213 kB/s
$ file detail-fixed.exe
detail-fixed.exe: PE32 executable for MS Windows (GUI) Intel 80386 32-bit
$ md5sum detail-fixed.exe
82a44254c1ce2019936a8428c93f5354 detail-fixed.exe
This time, the emulator, ThreatExpert and VirusTotal were all happy with the file, and while anti-virus coverage at the time was poor, the EXE/SCR embedded within the RTF attachment was quickly confirmed as unfriendly.
Comments