get_unpacked_files
Extract and classify memory-unpacked binaries from sandbox analysis to identify runtime-decrypted payloads and memory-resident code, with process ID and execution timing metadata.
Instructions
Retrieve and classify in-memory unpacked binaries from a sandbox analysis.
This tool extracts executable artifacts that were unpacked in memory during the dynamic execution of the submitted sample. These binaries typically reflect runtime-decrypted payloads or memory-resident code generated by the sample or its child processes.
Each extracted file is associated with:
- The process ID (pid) responsible for its memory region.
- A classification that indicates **when** during execution the memory snapshot was taken.
If a custom `save_path` is provided, the files are saved under `{save_path}/{webid}-{run}`. If the path is invalid or inaccessible, a fallback directory under `unpacked_files/{webid}-{run}` is used instead.
Snapshot types:
- "Snapshot at beginning of execution": Memory captured at process start.
- "Snapshot taken on unpacking (modifying executable sections or adding new ones)": Captured at runtime after self-modifying code or section manipulation.
- "Snapshot at the end of execution": Captured near process termination.
- "Snapshot taken when memory gets freed": Captured when memory regions were released.
Args:
webid (required): The submission ID of the analysis.
run (optional, default = 0): Index of the sandbox run to process (typically 0 for the first run).
save_path (optional): Optional base directory to store the unpacked files. If not valid, a default directory is used.
Returns:
A dictionary containing:
- output_directory: Absolute path where the files were saved.
- files: A list of unpacked file entries, each with:
- unpacked_file: Absolute path to the file on disk.
- pid: ID of the process associated with the memory region.
- type: A human-readable label describing when the snapshot was taken.
- note: A message indicating whether the fallback directory was used.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| webid | Yes | ||
| run | No | ||
| save_path | No |
Implementation Reference
- jbxmcp/tools.py:703-743 (handler)The MCP tool handler for 'get_unpacked_files'. Decorated with @mcp.tool(), it handles the tool invocation and delegates the core logic to download_unpacked_files in core.py, with error handling.@mcp.tool() async def get_unpacked_files(webid: str, run: int = 0, save_path: Optional[str] = None) -> Dict[str, Any]: """ Retrieve and classify in-memory unpacked binaries from a sandbox analysis. This tool extracts executable artifacts that were unpacked in memory during the dynamic execution of the submitted sample. These binaries typically reflect runtime-decrypted payloads or memory-resident code generated by the sample or its child processes. Each extracted file is associated with: - The process ID (pid) responsible for its memory region. - A classification that indicates **when** during execution the memory snapshot was taken. If a custom `save_path` is provided, the files are saved under `{save_path}/{webid}-{run}`. If the path is invalid or inaccessible, a fallback directory under `unpacked_files/{webid}-{run}` is used instead. Snapshot types: - "Snapshot at beginning of execution": Memory captured at process start. - "Snapshot taken on unpacking (modifying executable sections or adding new ones)": Captured at runtime after self-modifying code or section manipulation. - "Snapshot at the end of execution": Captured near process termination. - "Snapshot taken when memory gets freed": Captured when memory regions were released. Args: webid (required): The submission ID of the analysis. run (optional, default = 0): Index of the sandbox run to process (typically 0 for the first run). save_path (optional): Optional base directory to store the unpacked files. If not valid, a default directory is used. Returns: A dictionary containing: - output_directory: Absolute path where the files were saved. - files: A list of unpacked file entries, each with: - unpacked_file: Absolute path to the file on disk. - pid: ID of the process associated with the memory region. - type: A human-readable label describing when the snapshot was taken. - note: A message indicating whether the fallback directory was used. """ try: return await download_unpacked_files(webid, run, save_path) except Exception as e: return { "error": f"Failed to download unpacked files for submission ID '{webid}' run {run}. " f"Reason: {str(e)}" }
- jbxmcp/core.py:325-380 (helper)Core helper function implementing the logic to download, extract, and classify unpacked PE files from Joe Sandbox API. Handles directory creation, process tree mapping for PID association, and metadata extraction from filenames.async def download_unpacked_files(webid: str, run: Optional[int] = 0, save_path: Optional[str] = None) -> Dict[str, Any]: jbx_client = get_client() _, data = jbx_client.analysis_download(webid=webid, run=run, type='unpackpe') default_output_dir = os.path.join("unpacked_files", f"{webid}-{run}") output_dir = default_output_dir used_default_path = False root = await get_or_fetch_report(webid, run) proc_tree = extract_process_tree(root) targetid_to_pid = flatten_process_tree(proc_tree) if save_path: try: output_dir = os.path.join(save_path, f"{webid}-{run}") os.makedirs(output_dir, exist_ok=True) except (OSError, FileNotFoundError): output_dir = default_output_dir os.makedirs(output_dir, exist_ok=True) used_default_path = True else: os.makedirs(output_dir, exist_ok=True) # Extract files and associate them with process IDs and frame stages unpacked_files_info = [] with zipfile.ZipFile(io.BytesIO(data)) as zf: zf.extractall(path=output_dir, pwd=b"infected") for name in zf.namelist(): if name.endswith('/') or '.raw.' in name: continue base = os.path.basename(name) metadata = extract_unpack_filename_metadata(base) if metadata is None: continue targetid = metadata["targetid"] frame_label = metadata["frame_label"] pid = targetid_to_pid.get(targetid, "unknown") full_path = os.path.abspath(os.path.join(output_dir, name)) unpacked_files_info.append({ "unpacked_file": full_path, "pid": pid, "type": frame_label }) note = ( "User-provided save_path was invalid. Default directory was used." if used_default_path else "Extraction completed successfully." ) return { "output_directory": os.path.abspath(output_dir), "files": unpacked_files_info, "note": note }
- jbxmcp/server.py:19-20 (registration)Import of the tools module in the MCP server, which triggers registration of all @mcp.tool() decorated functions including get_unpacked_files via the shared mcp instance.import jbxmcp.tools as tools
- jbxmcp/core.py:303-324 (helper)Helper function to parse unpacked file filenames and map frame IDs to human-readable snapshot types used in download_unpacked_files.def extract_unpack_filename_metadata(filename: str) -> Optional[Dict[str, Any]]: """ Extract the targetid and frame id from the filename pattern: e.g., '1.2.filename.exe.abc.unpack' → targetid='1', frame_id=2 """ frame_map = { -1: "UNKNOWN", 0: "Snapshot at beginning of execution", 1: "Snapshot taken on unpacking (modifying executable sections or adding new ones)", 2: "Snapshot at the end of execution", 3: "Snapshot taken when memory gets freed" } match = re.match(r'^(\d+)\.(\d+)\..+\.unpack$', filename) if not match: return None targetid, frame_id = match.groups() frame_id = int(frame_id) return { "targetid": targetid, "frame_label": frame_map.get(frame_id, "UNKNOWN") }
- jbxmcp/core.py:264-288 (helper)Helper to extract and build the process tree from the XML report, used to map targetids to PIDs in unpacked files analysis.def extract_process_tree(process_elements) -> Dict[str, Any]: """ Reconstructs a process tree as a nested json array from the xml report """ def process_node(proc_elem): # Extract key attributes attrs = proc_elem.attrib node = { "name": attrs.get("name"), "pid": attrs.get("pid"), "cmdline": attrs.get("cmdline"), "path": attrs.get("path"), "targetid": attrs.get("targetid"), "has_exited": attrs.get("hasexited") == "true" } # Recursively extract children children = proc_elem.findall("./process") if children: node["children"] = [process_node(child) for child in children] return node process_elements = process_elements.findall("./behavior/system/startupoverview/process") return [process_node(p) for p in process_elements]