analyze_file

Analyze suspicious files for malware using REMnux tools. Detects file type automatically and runs appropriate analysis tools with configurable depth levels for triage or comprehensive investigation.

Instructions

Auto-analyze a file using REMnux tools appropriate for the detected file type. Runs file to detect type, then executes matching tools (e.g., PE → peframe/capa, PDF → pdfid/pdf-parser, Office → olevba/oleid). Use depth to control analysis intensity: 'quick' (triage only), 'standard' (default), 'deep' (includes expensive tools). Note: 'standard' is sufficient for most files; use 'deep' only when standard doesn't reveal enough.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`file`	Yes	Filename relative to samples directory, or absolute path in local mode
`timeout_per_tool`	No	Timeout per tool in seconds (default: 60)
`depth`	No	Analysis depth. 'quick' (~5-15s): fast triage. 'standard' (~30-90s, default): comprehensive analysis. 'deep' (~2-5min): exhaustive. Use 'deep' only when 'standard' isn't enough.	standard

Implementation Reference

src/handlers/analyze-file.ts:272-675 (handler)
Main handler function handleAnalyzeFile that implements the analyze_file tool logic. It validates file paths, detects file types using the 'file' command, matches file to category via matchFileType(), runs applicable preprocessing, executes tools from the registry based on category tag and depth tier, parses tool outputs, extracts IOCs, generates triage summary and next steps, and formats the final response.
export async function handleAnalyzeFile( deps: HandlerDeps, args: AnalyzeFileArgs ) { const startTime = Date.now(); try { const { connector, config } = deps; const depth = (args.depth ?? "standard") as DepthTier; // Workflow hint for non-default depth const workflowHint = depth === "deep" ? "TIP: For most files, depth='standard' (the default, ~30-90s) provides sufficient coverage. Use 'deep' only when standard analysis doesn't reveal enough, or for exhaustive investigation." : undefined; // Validate file path (skip unless --sandbox) if (!config.noSandbox) { const validation = validateFilePath(args.file, config.samplesDir); if (!validation.safe) { return formatError("analyze_file", new REMnuxError( validation.error || "Invalid file path", "INVALID_PATH", "validation", "Use a relative path within the samples directory", ), startTime); } } const filePath = (config.mode === "local" && args.file.startsWith("/")) ? args.file : `${config.samplesDir}/${args.file}`; const perToolTimeout = (args.timeout_per_tool || 60) * 1000; // Step 1: Detect file type let fileOutput: string; try { const result = await connector.execute(["file", filePath], { timeout: 30000 }); fileOutput = result.stdout?.trim() || ""; if (!fileOutput) { return formatError("analyze_file", new REMnuxError( "Could not determine file type (empty `file` output)", "EMPTY_OUTPUT", "tool_failure", "Check that the file exists and is readable", ), startTime); } } catch (error) { const msg = `Error running file command: ${error instanceof Error ? error.message : "Unknown error"}`; return formatError("analyze_file", new REMnuxError( msg, "EMPTY_OUTPUT", "tool_failure", "Check that the file exists and is readable", ), startTime); } // Compute the file's own hashes so we can filter them from IOC results const ownHashes = new Set<string>(); try { const hashResult = await connector.execute( ["sh", "-c", `md5sum '${filePath.replace(/'/g, "'\\''")}' && sha1sum '${filePath.replace(/'/g, "'\\''")}' && sha256sum '${filePath.replace(/'/g, "'\\''")}'`], { timeout: 30000 }, ); if (hashResult.exitCode === 0) { for (const line of hashResult.stdout.split("\n")) { const hash = line.trim().split(/\s+/)[0]; if (hash && /^[a-fA-F0-9]{32,128}$/.test(hash)) { ownHashes.add(hash.toLowerCase()); } } } } catch { /* best effort — if hashing fails, we just skip filtering */ } // Step 2: Match to category and get tools from registry by tag + depth const category = matchFileType(fileOutput, args.file); const tag = CATEGORY_TAG_MAP[category.name] ?? "fallback"; const tools = toolRegistry.byTagAndTier(tag, depth); // Step 2b: Run applicable preprocessors let analysisPath = filePath; const preprocessResults: Array<{ name: string; description: string; outputPath?: string; error?: string }> = []; for (const pp of getPreprocessors(category.name)) { try { const detect = await connector.executeShell(pp.detectCommand(filePath), { timeout: 10000, cwd: config.samplesDir, }); if (detect.exitCode !== 0) continue; // Not applicable const safeFile = args.file.replace(/[^a-zA-Z0-9._-]/g, "_"); const outPath = `${config.outputDir}/preprocessed-${pp.name}-${safeFile}`; const result = await connector.executeShell(pp.processCommand(filePath, outPath), { timeout: pp.timeout, cwd: config.samplesDir, }); if (result.exitCode === 0) { analysisPath = outPath; preprocessResults.push({ name: pp.name, description: pp.description, outputPath: outPath }); } else { preprocessResults.push({ name: pp.name, description: pp.description, error: result.stderr?.trim() || `Exit code ${result.exitCode}`, }); } } catch (error) { preprocessResults.push({ name: pp.name, description: pp.description, error: error instanceof Error ? error.message : "Unknown error", }); } } const toolsRun: ToolRun[] = []; const toolsFailed: ToolFailed[] = []; const toolsSkipped: ToolSkipped[] = []; let totalOutputSize = 0; // Step 3: Run each tool for (const tool of tools) { // Skip tools that require user-supplied arguments (can't auto-run) if (tool.requiresUserArgs) { toolsSkipped.push({ name: tool.name, command: tool.command, reason: "Requires user-supplied arguments (use run_tool manually)", skip_type: "requires_user_args", }); continue; } const cmd = buildCommandFromDefinition(tool, analysisPath, config.outputDir); // Ensure output directories exist for tools that write to --output-dir if (tool.fixedArgs && config.outputDir) { const dirIdx = tool.fixedArgs.indexOf("--output-dir"); if (dirIdx !== -1 && tool.fixedArgs[dirIdx + 1]) { const rawDir = tool.fixedArgs[dirIdx + 1]; const resolvedDir = rawDir.startsWith("/tmp/") ? rawDir.replace("/tmp/", config.outputDir + "/") : rawDir; try { await connector.execute(["mkdir", "-p", resolvedDir], { timeout: 5000 }); } catch { /* best effort */ } } } // Use the greater of user-specified timeout and tool's own timeout const effectiveTimeout = Math.max(perToolTimeout, (tool.timeout ?? 60) * 1000); try { const result = await connector.executeShell(cmd, { timeout: effectiveTimeout, cwd: config.samplesDir, }); let stderr = result.stderr || ""; stderr = filterStderrNoise(stderr); // Detect Python TypeError/AttributeError indicating wrong file type for tool // Only trigger when stderr contains Python traceback AND output is minimal const isPythonTypeError = result.exitCode !== 0 && /^Traceback $most recent call last$:/m.test(stderr) && /TypeError|AttributeError|ValueError|KeyError|IndexError|ImportError|ModuleNotFoundError|FileNotFoundError|UnicodeDecodeError/i.test(stderr) && !/command not found/i.test(stderr); if (isPythonTypeError) { const hint = tool.exitCodeHints?.[result.exitCode] || `Tool encountered error on this file type (${stderr.match(/(?:TypeError|AttributeError|ValueError|KeyError|IndexError|ImportError|ModuleNotFoundError|FileNotFoundError|UnicodeDecodeError)[^\n]*/)?.[0] || "see stderr"})`; toolsSkipped.push({ name: tool.name, command: cmd, reason: hint, skip_type: "not_applicable", }); continue; } // Detect missing tools — only match shell "command not found" or exit code 127, // not tool output that happens to contain "not found" (e.g., pescan "section not found") const isNotInstalled = result.exitCode === 127 || /command not found/i.test(stderr) || (result.exitCode !== 0 && /^.*: No such file or directory$/m.test(stderr) && stderr.includes(tool.command)); if (isNotInstalled) { toolsSkipped.push({ name: tool.name, command: cmd, reason: "Tool not installed", skip_type: "not_installed", }); continue; } // Detect timeout via GNU timeout exit codes (124 = SIGTERM timeout, 137 = SIGKILL) const isTimeout = result.exitCode === 124 || result.exitCode === 137; if (isTimeout) { toolsFailed.push({ name: tool.name, command: cmd, error: "Timed out" }); continue; } let output = result.stdout || stderr || "(no output)"; const fullLen = output.length; // Per-tool budget, further reduced if approaching total response budget const remainingBudget = Math.max(5 * 1024, TOTAL_RESPONSE_BUDGET - totalOutputSize); const budget = Math.min(TOOL_OUTPUT_BUDGETS[tool.name] ?? DEFAULT_OUTPUT_BUDGET, remainingBudget); const outputTruncated = output.length > budget; let savedOutputFile: string | undefined; if (outputTruncated) { // Save full output to output dir for later retrieval (if under size limit) const safeFile = args.file.replace(/[^a-zA-Z0-9._-]/g, "_"); const outFilename = `${tool.name}-${safeFile}.txt`; if (fullLen <= MAX_SAVED_OUTPUT_SIZE) { try { const outPath = `${config.outputDir}/${outFilename}`; await connector.writeFile(outPath, Buffer.from(output, "utf-8")); savedOutputFile = outFilename; } catch { // Non-fatal: truncation hint won't include file reference } } // Build truncation message with optional parsing hints const hints = PARSING_HINTS[tool.name]?.map(h => h.replace(/<file>/g, outFilename)); let truncationMsg: string; if (savedOutputFile) { truncationMsg = `\n\n[Truncated at ${Math.round(budget / 1024)}KB of ${Math.round(fullLen / 1024)}KB total. Full output: output/${savedOutputFile}]`; if (hints && hints.length > 0) { truncationMsg += `\n[Query with: ${hints[0]}]`; } } else if (fullLen > MAX_SAVED_OUTPUT_SIZE) { truncationMsg = `\n\n[Output too large (${Math.round(fullLen / 1024)}KB) to save. Re-run tool with filters.]`; } else { truncationMsg = `\n\n[Truncated at ${Math.round(budget / 1024)}KB of ${Math.round(fullLen / 1024)}KB total]`; } output = output.slice(0, budget) + truncationMsg; } totalOutputSize += output.length; const parsed = parseToolOutput(tool.name, output); // Check for tool-specific exit code hints const extraMetadata: Record<string, unknown> = {}; const hint = tool.exitCodeHints?.[result.exitCode]; if (hint) { extraMetadata.analyst_note = hint; } // Extract capa summary from JSON output for compact overview if (tool.name === "capa" && tool.outputFormat === "json" && result.stdout) { try { const capaData = JSON.parse(result.stdout); if (capaData.rules) { const rules = capaData.rules as Record<string, { attack?: Array<{ technique: string }> }>; const attackTechniques = [...new Set( Object.values(rules) .flatMap((r) => r.attack || []) .map((a) => a.technique) )]; extraMetadata.capa_summary = { capability_count: Object.keys(rules).length, attack_techniques: attackTechniques, top_capabilities: Object.keys(rules).slice(0, 10), }; } } catch { // JSON parse failed, skip summary extraction } } toolsRun.push({ name: tool.name, command: cmd, output, exit_code: result.exitCode, ...(outputTruncated && { truncated: true, full_output_length: fullLen }), ...(parsed.parsed && { findings: parsed.findings, metadata: { ...parsed.metadata, ...extraMetadata }, }), ...(!parsed.parsed && Object.keys(extraMetadata).length > 0 && { metadata: extraMetadata }), }); } catch (error) { const msg = error instanceof Error ? error.message : "Unknown error"; if (msg.toLowerCase().includes("timeout")) { toolsFailed.push({ name: tool.name, command: cmd, error: "Timed out" }); } else { toolsFailed.push({ name: tool.name, command: cmd, error: msg }); } } } const combinedOutput = toolsRun.map(t => t.output).join("\n\n") .replace(/^\s*"command":\s*".*"$/gm, ""); // Filter metadata lines (author, reference, namespace, etc.) to prevent false IOC extraction // from tool/rule metadata (e.g., capa authors, YARA rule references) const filteredOutput = filterMetadataLines(combinedOutput); const iocResult = extractIOCs(filteredOutput); // Filter out the analyzed file's own hashes from IOC results if (ownHashes.size > 0) { iocResult.iocs = iocResult.iocs.filter((ioc) => !ownHashes.has(ioc.value.toLowerCase())); } // Generate triage summary and next steps const triageSummary = generateTriageSummary(category.name, toolsRun, iocResult.iocs.length); const suggestedNextSteps = generateNextSteps( category.name, depth, toolsRun, toolsSkipped, iocResult.iocs.length ); // Evaluate cross-tool advisories const advisoryContext: AdvisoryContext = { toolsRun: toolsRun.map((t) => ({ name: t.name, exit_code: t.exit_code, output: t.output, })), category: category.name, }; const advisories = evaluateAdvisories(advisoryContext); const analysisGuidance = "IMPORTANT: Many capabilities flagged by analysis tools (API imports like GetProcAddress/VirtualProtect, " + "memory operations, TLS sections, anti-debug patterns) are common in BOTH malware and legitimate software. " + "Do not assume malicious intent from flagged items alone. For each finding, consider: " + "(1) Is this expected for legitimate software of this type? " + "(2) Do multiple findings together suggest malicious purpose, or are they individually " + "explainable as normal development practices? " + "(3) What concrete evidence distinguishes this from a benign program? " + "State your confidence level (low/medium/high) and what evidence supports or contradicts a malicious verdict. " + "ATTRIBUTION AND CLASSIFICATION: " + "YARA family signatures (yara-forge) indicate resemblance to known families, not confirmed identity — " + "signatures can match shared code, libraries, or techniques reused across unrelated families. " + "YARA behavioral rules and capa detections flag code patterns, not confirmed runtime behaviors — " + "a 'keylogger' rule match means keylogging-related code patterns were detected, but static analysis " + "alone cannot confirm the sample actually performs keylogging at runtime. " + "When multiple tools converge on a classification, this strengthens the hypothesis " + "but does not confirm it. Use 'consistent with' or 'matches patterns associated with' rather than " + "'confirms' or 'identified as'. State attribution confidence separately from detection confidence."; // Check if output exceeds budget - return summary instead of full output if (shouldSummarize(toolsRun)) { const summary = generateSummary( args.file, fileOutput, category.name, depth, triageSummary, toolsRun, toolsFailed, toolsSkipped, preprocessResults, iocResult.iocs, iocResult.summary, suggestedNextSteps, analysisGuidance, workflowHint, advisories.length > 0 ? advisories.map((a) => ({ priority: a.priority, issue: a.issue, remediation: a.remediation, })) : undefined, ); return formatResponse("analyze_file", summary, startTime); } return formatResponse("analyze_file", { ...(advisories.length > 0 && { action_required: advisories.map((a) => ({ priority: a.priority, issue: a.issue, remediation: a.remediation, })), }), file: args.file, detected_type: fileOutput, matched_category: category.name, depth, triage_summary: triageSummary, ...(preprocessResults.length > 0 && { preprocessing: preprocessResults }), analysis_guidance: analysisGuidance, ...(workflowHint && { workflow_hint: workflowHint }), ...(tools.length === 0 && { warning: `No tools registered for category "${category.name}" at depth "${depth}". Try depth "deep" or use run_tool directly.`, }), suggested_next_steps: suggestedNextSteps, iocs: iocResult.iocs, ioc_summary: iocResult.summary, tools_run: toolsRun, tools_failed: toolsFailed, tools_skipped: toolsSkipped, }, startTime); } catch (error) { return formatError("analyze_file", toREMnuxError(error, deps.config.mode), startTime); } }
src/schemas/tools.ts:44-51 (schema)
Zod schema definition for analyze_file input arguments. Defines 'file' (required path), 'timeout_per_tool' (optional per-tool timeout), and 'depth' (optional enum: 'quick'/'standard'/'deep' with 'standard' as default). The schema is used for input validation at tool registration.
export const analyzeFileSchema = z.object({ file: z.string().describe("Filename relative to samples directory, or absolute path in local mode"), timeout_per_tool: z.number().optional().describe("Timeout per tool in seconds (default: 60)"), depth: z.enum(["quick", "standard", "deep"]).optional().default("standard").describe( "Analysis depth. 'quick' (~5-15s): fast triage. 'standard' (~30-90s, default): comprehensive analysis. 'deep' (~2-5min): exhaustive. Use 'deep' only when 'standard' isn't enough." ), }); export type AnalyzeFileArgs = z.input<typeof analyzeFileSchema>;
src/index.ts:179-185 (registration)
Tool registration for analyze_file. Calls server.tool() with the tool name 'analyze_file', description, the analyzeFileSchema.shape for input validation, and the handler function handleAnalyzeFile(deps, args) as the executor.
// Tool: analyze_file - Auto-analyze a file using appropriate REMnux tools server.tool( "analyze_file", "Auto-analyze a file using REMnux tools appropriate for the detected file type. Runs `file` to detect type, then executes matching tools (e.g., PE → peframe/capa, PDF → pdfid/pdf-parser, Office → olevba/oleid). Use `depth` to control analysis intensity: 'quick' (triage only), 'standard' (default), 'deep' (includes expensive tools). Note: 'standard' is sufficient for most files; use 'deep' only when standard doesn't reveal enough.", analyzeFileSchema.shape, (args) => handleAnalyzeFile(deps, args) );
src/file-type-mappings.ts:157-215 (helper)
File type detection function matchFileType() that matches 'file' command output to analysis categories. Contains patterns for PE, PDF, OLE2, OOXML, ELF, APK, PCAP, etc. Handles fallback classification based on file extensions when file command reports 'data' or 'Zip archive'. Returns a FileTypeCategory with name and patterns.
export function matchFileType(fileOutput: string, filename?: string): FileTypeCategory { // Strip "<path>: " prefix from `file` command output (e.g., "/home/remnux/files/samples/foo.img: data" → "data") const typeOutput = fileOutput.includes(":") ? fileOutput.split(":").slice(1).join(":").trim() : fileOutput.trim(); for (const category of FILE_TYPE_CATEGORIES) { for (const pattern of category.patterns) { if (pattern.test(typeOutput)) { return category; } } } // Fallback: extension-based classification for text files that didn't match specific patterns if (filename) { if (JAVASCRIPT_EXTENSIONS.test(filename)) { return FILE_TYPE_CATEGORIES.find((c) => c.name === "JavaScript")!; } if (SCRIPT_EXTENSIONS.test(filename)) { return FILE_TYPE_CATEGORIES.find((c) => c.name === "Script")!; } if (PYTHON_EXTENSIONS.test(filename)) { return FILE_TYPE_CATEGORIES.find((c) => c.name === "Python")!; } } // Fallback: if `file` says "Zip archive" and filename has OOXML extension, classify as OOXML if (filename && /zip archive/i.test(typeOutput)) { if (OOXML_EXTENSIONS.test(filename)) { return FILE_TYPE_CATEGORIES.find((c) => c.name === "OOXML")!; } } // Fallback: if filename has OLE2 extension and `file` output is ambiguous (e.g., "data", "CDF") if (filename && OLE2_EXTENSIONS.test(filename) && /^data$|^CDF/i.test(typeOutput)) { return FILE_TYPE_CATEGORIES.find((c) => c.name === "OLE2")!; } // Fallback: PCAP files — `file` may report "data" for some capture formats if (filename && PCAP_EXTENSIONS.test(filename)) { return FILE_TYPE_CATEGORIES.find((c) => c.name === "PCAP")!; } // Fallback: memory images — `file` reports "data" for raw memory dumps if (filename && MEMORY_EXTENSIONS.test(filename) && /^data$/i.test(typeOutput)) { return { name: "Memory", patterns: [] }; } // Fallback: shellcode — `file` reports "data" and filename has shellcode extension if (filename && SHELLCODE_EXTENSIONS.test(filename) && /^data$/i.test(typeOutput)) { return { name: "Shellcode", patterns: [] }; } // Fallback: PE-like extension but `file` reports "data" — may be raw shellcode, packed, or corrupted if (filename && PE_LIKE_EXTENSIONS.test(filename) && /^data$/i.test(typeOutput)) { return { name: "DataWithPEExtension", patterns: [] }; } return { name: "Unknown", patterns: [] }; }
src/tools/registry.ts:84-89 (helper)
Tool registry method byTagAndTier() that filters tools by both tag (file type category) and depth tier. This is used by handleAnalyzeFile to select the appropriate tools to run for each file category at the specified analysis depth. The method ensures only tools at or below the specified tier are included.
byTagAndTier(tag: string, tier: DepthTier): ToolDefinition[] { const maxIndex = DEPTH_TIER_ORDER.indexOf(tier); return this.all().filter( (t) => t.tags?.includes(tag) && DEPTH_TIER_ORDER.indexOf(t.tier) <= maxIndex ); }

REMnux MCP Server

analyze_file

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API