Binary Analysis with Jupyter and Radare2
Jupyter has become very popular within the data science community, as it is an easy way of working interactively with Python, R and other languages. Within Jupyter you'll create a notebook, which contains (live) code, visualisations and markdown. It is being used for data processing, numerical simulations, modelling, data visualisation, machine-learning and let's reverse engineering to this.
If you combine Radare2 together with Jupyter, you'll have an interactive way of working with your binaries. You'll be able to execute individual steps, change them and re-execute, helping you with your analyses flow. What I really like about working with radare2 from within a notebook, is that all steps are being documented, registered and could be changed and re-run easily. Combining Radare2 possibilities with all that come with Jupyter is powerful beyond imagination.
There's a docker image that can be used, surprisingly nl5887/radare2-notebook which contains jupyter-notebook with radare2 build on top. Cutter, the gui frontend of radare2 has also Jupyter support built-in, which can be used also.
To start the image run the following, which wil start jupyter while exposing ports 8888 (jupyter) and 6006 (tensorboard).
docker run -p 8888:8888 -p 6006:6006 -v $(pwd)/notebooks/:/home/jovyan/ nl5887/radare2-notebook
The output will show the url that needs to be used to connect to the notebook. This url contains a token that is being used to authenticate to Jupyter.
Let's start with a simple notebook, that will extract (potentially) interesting IOCs out of a linux malware binary. Notebooks consists of different cell types, which could be markdown or code. We'll use a Python kernel with Jupyter, though many other languages are supported. Every code block is created as a separate cell.
try:
# if using jupyter within cutter, use the following. This will use the current active binary.
import cutter
# we'll assign cutter to variable r2 to be consistent with r2pipe
r2 = cutter
except ModuleNotFoundError as exc:
# using r2pipe to open a binary
import r2pipe
r2 = r2pipe.open("/home/jovyan/radare2/malware/vv")
Now we've created a r2pipe session with binary, we'll start basic analyses. We can use Jupyter magic commands, like %time
to get information about timings etc.
%time r2.cmd('aaa')
The binary has been analysed, now we can output information about the binary.
print(r2.cmd('i'))
If you append the character j to the command, radare2 will output as json. The code below will parse the json information, pretty print it and extract the arch out of the structure.
from pprint import pprint
r = json.loads(r2.cmd('ij'))
pprint(r)
print(r.get('bin').get('arch'))
This is all we need to know to build a simple IOC extractor, this cell will walk through all found string references and check it against some matchers. If it identifies ip addresses, urls, ansi output or email addresses, they'll be outputted.
import r2pipe
import json
import struct
import re
import base64
from pprint import pprint, pformat
IP_MATCHER = re.compile("(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?:[:]\d+)?)")
URL_MATCHER = re.compile('(?:(?:https?|ftp|file)://|www\.|ftp\.)[-A-Z0-9+&@#/%=~_|$?!:,.]*[A-Z0-9+&@#/%=~_|$]', re.IGNORECASE)
EMAIL_MATCHER = re.compile('([A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4})', re.IGNORECASE)
def regex_matcher(matcher):
return lambda st: matcher.findall(st)
def contains_matcher(s):
return lambda st: [st] if s in st else []
matchers = [regex_matcher(IP_MATCHER), regex_matcher(URL_MATCHER), regex_matcher(EMAIL_MATCHER), contains_matcher('\\e['), contains_matcher('HTTP')]
def print_s(s, r):
print('0x{:08x} 0x{:08x} {:10} {:4} {:10} {}'.format(s.get('paddr'), s.get('vaddr'), s.get('type'), s.get('length'), s.get('section'), r))
strings = json.loads(r2.cmd('izj'))
for s in strings:
try:
st = base64.b64decode(s.get('string')).decode(s.get('type'))
for matcher in matchers:
matches = matcher(st)
for match in matches:
print_s (s, match)
except ValueError as e:
# print(e)
continue
except LookupError as e:
# print(e)
continue
Giving this output:
0x0010c3be 0x0050c3be ascii 15 .rodata \e[01;32mresumed
0x0010c3f0 0x0050c3f0 ascii 49 .rodata \e[01;33mpaused\e[0m, press \e[01;35mr\e[0m to resume
0x0010c4e0 0x0050c4e0 ascii 71 .rodata \e[1;32m * \e[0m\e[1;37mPOOL #%-7zu\e[0m\e[1;%dm%s\e[0m variant \e[1;37m%s\e[0m
0x0010c528 0x0050c528 ascii 60 .rodata \e[1;32m * \e[0m\e[1;37m%-13s\e[0m\e[1;36m%s/%s\e[0m\e[1;37m %s\e[0m
0x0010c568 0x0050c568 ascii 41 .rodata \e[1;32m * \e[0m\e[1;37m%-13slibuv/%s %s\e[0m
0x0010f8b0 0x0050f8b0 ascii 5 .rodata \e[0m\n
0x0010f8b6 0x0050f8b6 ascii 7 .rodata \e[0;31m
0x0010f8be 0x0050f8be ascii 7 .rodata \e[0;33m
0x0010f8c6 0x0050f8c6 ascii 7 .rodata \e[1;37m
0x0010f8ce 0x0050f8ce ascii 5 .rodata \e[90m
0x0011031d 0x0051031d ascii 7 .rodata \e[1;30m
0x00110388 0x00510388 ascii 61 .rodata \e[1;37muse pool \e[0m\e[1;36m%s:%d \e[0m\e[1;32m%s\e[0m \e[1;30m%s
0x001103c8 0x005103c8 ascii 81 .rodata \e[01;31mrejected\e[0m (%ld/%ld) diff \e[01;37m%u\e[0m \e[31m"%s"\e[0m \e[01;30m(%lu ms)
0x00110450 0x00510450 ascii 67 .rodata \e[01;32maccepted\e[0m (%ld/%ld) diff \e[01;37m%u\e[0m \e[01;30m(%lu ms)
0x001104c0 0x005104c0 ascii 78 .rodata \e[1;35mnew job\e[0m from \e[1;37m%s:%d\e[0m diff \e[1;37m%d\e[0m algo \e[1;37m%s\e[0m
0x001106c4 0x005106c4 ascii 8 .rodata \e[1;31m-
0x001106cd 0x005106cd ascii 7 .rodata \e[1;31m
0x0011076e 0x0051076e ascii 15 .rodata \e[1;31mnone\e[0m
0x0011077e 0x0051077e ascii 16 .rodata \e[1;32mintel\e[0m
0x0011078f 0x0051078f ascii 16 .rodata \e[1;32mryzen\e[0m
0x001107a0 0x005107a0 ascii 93 .rodata \e[1;32m * \e[0m\e[1;37m%-13s\e[0m\e[1;36m%d\e[0m\e[1;37m, %s, av=%d, %sdonate=%d%%\e[0m\e[1;37m%s\e[0m
0x00110828 0x00510828 ascii 73 .rodata \e[1;32m * \e[0m\e[1;37m%-13s\e[0m\e[1;36m%d\e[0m\e[1;37m, %s, %sdonate=%d%%\e[0m
0x00110878 0x00510878 ascii 37 .rodata \e[1;32m * \e[0m\e[1;37m%-13sauto:%s\e[0m
0x001108a0 0x005108a0 ascii 32 .rodata \e[1;32m * \e[0m\e[1;37m%-13s%s\e[0m
0x001108c8 0x005108c8 ascii 49 .rodata \e[1;32m * \e[0m\e[1;37m%-13s%s (%d)\e[0m %sx64 %sAES
0x00110900 0x00510900 ascii 45 .rodata \e[1;32m * \e[0m\e[1;37m%-13s%.1f MB/%.1f MB\e[0m
0x00110930 0x00510930 ascii 127 .rodata \e[1;32m * \e[0m\e[1;37mCOMMANDS \e[0m\e[1;35mh\e[0m\e[1;37mashrate, \e[0m\e[1;35mp\e[0m\e[1;37mause, \e[0m\e[1;35mr\e[0m\e[1;37mesume\e[0m
0x001124d0 0x005124d0 ascii 96 .rodata \e[1;37mspeed\e[0m 10s/60s/15m \e[1;36m%s\e[0m\e[0;36m %s %s \e[0m\e[1;36mH/s\e[0m max \e[1;36m%s H/s\e[0m
0x001131c8 0x005131c8 ascii 7 .rodata \e[1;33m
0x00113230 0x00513230 ascii 110 .rodata \e[1;32mREADY (CPU)\e[0m threads \e[1;36m%zu(%zu)\e[0m huge pages %s%zu/%zu %1.0f%%\e[0m memory \e[1;36m%zu.0 MB\e[0m
This is just a basic example of what you can do with radare2 together with Jupyter. You can find the complete notebook here, Github supports notebooks also, giving a nice view of it. Please share your ideas, comments and/or insights, with me via social media, @remco_verhoef or email, remco.verhoef at dutchsec dot com.
Remco Verhoef (@remco_verhoef)
ISC Handler - Founder of DutchSec
PGP Key
Comments