Quickie: Parsing XLSB Documents
Inspired by Xavier's diary entry "XLSB Files: Because Binary is Stealthier Than XML", I took a look at Microsoft's XLSB specification.
This confirmed my hopes: the binary format of XLSB files is a sequence of TLV records, just like BIFF. At least for sheets and shared string tables, I haven't looked at the other file formats yet.
The type and length of each TLV record is a variable length integer: from 1 to 2 bytes (type) and from 1 to 4 bytes (length). It's stored in little-endian format, and the least significant bytes have all their most significant bit set. The most significant byte has its most significant bit cleared. 7 least significant bits are used to encode the integer value. This implies that the highest value for a type integer is number 16383.
I wrote a simple parser, it is still in beta: xlsbdump.py.
Didier Stevens
Senior handler
Microsoft MVP
blog.DidierStevens.com
Comments