Analyzing PDF Streams
Occasionaly, Xavier and Jim will ask me specific students' questions about my tools when they teach FOR610: Reverse-Engineering Malware.
Recently, a student wanted to know if my pdf-parser.py tool can extract all the PDF streams with a single command.
Since version 0.7.9, it can.
A stream is (binary) data, part of an object (optional), and can be compressed, or otherwise transformed. To view a single stream with pdf-parser, one selects the object of interest and uses option -f to apply the filters (like zlib decompression) to the stream:
I added a feature that is present in several of my tools, like oledump.py and zipdump.py: extract al of the "stored items" into a single JSON document.
When you use pdf-parser's option -j (--jsonoutput), all objects with a stream, will have the raw data (e.g., unfiltered) extracted and put into a JSON document that is sent to stdout:
To have the filtered (e.g., decompressed data), use option -f together with option -j:
What can you do with this JSON data? It depends on what your goals are. I have several tools that can take this JSON data as input, like file-magic.py and strings.py.
Here I use file-magic.py to identify the type of each raw data stream:
From this we can learn, for example, that object 143's stream contains a JPEG image.
And here I use file-magic.py to identify the type of each filtered data stream:
From this we can learn, for example, that object 881's stream contains a compressed TrueType Font file.
What if you want to write all stream data to disk, in individual files, for further analysis (that's what the student wanted to do, I guess)?
Then you can use my tool myjson-filter.py. It's a tool designed to filter JSON data produced by my tools, but it can also write items to disk.
When you use option -l, this tool will just produce a listing of the items contained in de JSON data:
And you can use option -W to write the streams to disk. -W takes a value that specifies what aming convention must be used to write the file to disk. vir will write items to disk with their sanitized name and extension .vir:
hashvir will write items to disk with their sha256 value as name and extension .vir:
Didier Stevens
Senior handler
Microsoft MVP
blog.DidierStevens.com
Comments