Triage is Key! Python to the Rescue!

Published: 2025-07-29. Last Updated: 2025-07-29 09:29:53 UTC
by Xavier Mertens (Version: 1)

When you need to quickly analyze a lot of data, there is one critical step to perform: Triage. In forensic investigations, this step is critical because it allows investigators to quickly identify, prioritize, and isolate the most relevant or high value evidence from large volumes of data, ensuring that limited time and resources are focused on artifacts most likely to reveal key facts about an incident. Sometimes, a quick script will be enough to speed up this task.

Today, I'm working on a case where I have a directory containing +20.000 mixed files. Amongst them, a lot of ZIP archives (mainly Office documents), containing also lot of files. The idea is to scan all those files (including the ZIP archives) for some keywords. I wrote a quick Python script that will scan all files against the embedded YARA rule and, if a match is found, copy the original file into a destination directory.

Here is the script:

#
# Quick Python triage script
# Copy files matching a YARA rule to another directory
#
import yara
import os
import shutil
import zipfile
import io

# YARA rule
yara_rule = """
rule case_xxxxxx_search_1
{
    strings:
        $s1 = "string1" nocase wide ascii
        $s2 = "string2" nocase wide ascii
        $s3 = "string3" nocase wide ascii
        $s4 = "string4" nocase wide ascii
        $s5 = "string5" nocase wide ascii
    condition:
        any of ($s*)
}
"""

source_dir = "Triage"
dest_dir = "MatchedFiles"
os.makedirs(dest_dir, exist_ok=True)
rules = yara.compile(source=yara_rule)

def is_zip_file(filepath):
    """
    Check ZIP archive magic bytes.
    """
    try:
        with open(filepath, "rb") as f:
            sig = f.read(4)
            return sig in (b"PK\x03\x04", b"PK\x05\x06", b"PK\x07\x08")
    except Exception:
        return False

def safe_extract_path(member_name):
    """
    Returns a safe relative path inside the destination folder (Prevent .. in paths).
    """
    return os.path.normpath(member_name).replace("..", "_")

def scan_file(filepath, file_bytes=None, inside_zip=False, zip_name=None, member_name=None):
    """
    Scan a file with YARA.
    """
    try:
        if file_bytes is not None:
            matches = rules.match(data=file_bytes)
        else:
            matches = rules.match(filepath)

        if matches:
            if inside_zip:
                print("[MATCH] {member_name} (inside {zip_name})")
                rel_path = os.path.relpath(zip_name, source_dir)
                filepath = os.path.join(source_dir, rel_path)
                dest_path = os.path.join(dest_dir, rel_path)
            else:
                print("[MATCH] {filepath}")
                rel_path = os.path.relpath(filepath, source_dir)
                dest_path = os.path.join(dest_dir, rel_path)
            
            # Save a copy
            os.makedirs(os.path.dirname(dest_path), exist_ok=True)
            shutil.copy2(filepath, dest_path)
    except Exception as e:
        print(e)
        pass

# Main
for root, dirs, files in os.walk(source_dir):
    for name in files:
        filepath = os.path.join(root, name)
        if is_zip_file(filepath):
            try:
                with zipfile.ZipFile(filepath, 'r') as z:
                    for member in z.namelist():
                        if member.endswith("/"):  # Skip directories
                            continue
                        try:
                            file_data = z.read(member)
                            scan_file(member, file_bytes=file_data, inside_zip=True, zip_name=filepath, member_name=member)
                        except Exception:
                            pass
            except zipfile.BadZipFile:
                pass
        else:
            scan_file(filepath)

Now, you can enjoy some coffee while the script does the job:

[MATCH] docProps/app.xml (inside Triage\xxxxxxx.xlsx)
[MATCH] xl/sharedStrings.xml (inside Triage\xxxxx.xlsx)
[MATCH] xl/sharedStrings.xml (inside Triage\xxxxxxxxxxxxxxxxxxxx.xlsx)
[MATCH] ppt/slides/slide3.xml (inside Triage\xxxxxxxxxxxxxxxxxxxxxx.pptx)
[MATCH] ppt/slides/slide12.xml (inside Triage\xxxxxxxxxxxxxxxxxxxxxx.pptx)
[MATCH] ppt/slides/slide14.xml (inside Triage\xxxxxxxxxxxxxxxxxxxxxx.pptx)
[MATCH] ppt/slides/slide15.xml (inside Triage\xxxxxxxxxxxxxxxxxxxxxx.pptx)
[MATCH] xl/sharedStrings.xml (inside Triage\xxxxxxxx.xlsx)
[MATCH] Triage\xxxxxxxxxxxxxxxxxxxxxxx.pdf
[MATCH] Triage\xxxxxxxxxxxxxxxxxxx.xls
[MATCH] xl/sharedStrings.xml (inside Triage\xxxxxxxxxxxxxxxx.xlsx)
[MATCH] Triage\xxxxxxxxxxxxxxxxxxxxxxxxxx.xls

You can see that, with a few lines of Python, you can speedup the triage phase in your investigations. Note that the script is written to handle my current files set and is not ready for broader use (lile to handle password-protected archives or other types of archives)

Xavier Mertens (@xme)
Xameco
Senior ISC Handler - Freelance Cyber Security Consultant
PGP Key

Keywords: DFIR Forensic Investigation Python Script Triage

0 comment(s)

ISC Stormcast For Tuesday, July 29th, 2025 https://isc.sans.edu/podcastdetail/9546

Internet Storm Center

Triage is Key! Python to the Rescue!

Comments