download_from_url
Download files from URLs for malware analysis, supporting custom headers and JavaScript-enabled sites when needed. Returns file metadata including hashes and type.
Instructions
Download a file from a URL into the samples directory for analysis. Returns file metadata (hashes, type, size). Supports custom HTTP headers and an optional thug mode for sites requiring JavaScript execution.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to download (http or https only) | |
| filename | No | Override filename in samples dir. If omitted, derived from URL path. | |
| headers | No | Custom HTTP headers as 'Name: value' strings. Example: ['User-Agent: Mozilla/5.0', 'X-Auth-Token: abc123'] | |
| method | No | Download method. 'curl' (default) for direct HTTP download. 'thug' for sites requiring JavaScript execution (uses thug honeyclient). | curl |
| overwrite | No | Whether to overwrite if file exists. Default: false | |
| timeout | No | Download timeout in seconds (default: server timeout) |
Implementation Reference
- src/handlers/download-from-url.ts:95-219 (handler)Main handler function handleDownloadFromUrl that implements the download_from_url tool. Validates URL/headers/filename, executes download via curl or thug, gathers file metadata, and returns formatted response. Includes helper functions validateUrl (33-51), validateHeader (57-69), deriveFilename (75-93), handleThugDownload (224-314), and gatherFileInfo (319-365).
export async function handleDownloadFromUrl( deps: HandlerDeps, args: DownloadFromUrlArgs, ) { const startTime = Date.now(); const { connector, config } = deps; const method = args.method ?? "curl"; try { // Validate URL const urlValidation = validateUrl(args.url); if (!urlValidation.valid) { return formatError("download_from_url", new REMnuxError( urlValidation.error!, "INVALID_URL", "validation", "Provide a valid http:// or https:// URL", ), startTime); } // Validate headers if provided const headers = args.headers ?? []; for (const h of headers) { const hv = validateHeader(h); if (!hv.valid) { return formatError("download_from_url", new REMnuxError( hv.error!, "INVALID_HEADER", "validation", "Headers must be 'Name: value' format without control characters or single quotes", ), startTime); } } // Determine filename const filename = args.filename ?? deriveFilename(args.url); const fnValidation = validateFilename(filename); if (!fnValidation.valid) { return formatError("download_from_url", new REMnuxError( fnValidation.error || "Invalid filename", "INVALID_FILENAME", "validation", "Use alphanumeric characters, hyphens, underscores, and dots only", ), startTime); } const filePath = `${config.samplesDir}/${filename}`; // Check if file exists (unless overwrite) if (!args.overwrite) { try { const checkResult = await connector.execute(["test", "-e", filePath], { timeout: 5000 }); if (checkResult.exitCode === 0) { return formatError("download_from_url", new REMnuxError( `File already exists: ${filename}. Use overwrite=true to replace.`, "FILE_EXISTS", "validation", "Set overwrite=true or choose a different filename", ), startTime); } } catch { // test command failed — file doesn't exist, proceed } } // Ensure samples dir exists try { await connector.execute(["mkdir", "-p", config.samplesDir], { timeout: 5000 }); } catch { // ignore — real errors surface below } if (method === "thug") { return await handleThugDownload(deps, args, headers, filename, filePath, startTime); } // ── Curl path ────────────────────────────────────────────────────────── const timeoutSecs = args.timeout ?? config.timeout; // Build curl command const headerFlags = headers.map(h => `-H '${h}'`).join(" "); const curlCmd = [ "curl -sSfL", `--max-filesize ${MAX_FILESIZE}`, `--max-redirs ${MAX_REDIRS}`, `--max-time ${timeoutSecs}`, headerFlags, `-o '${filePath}'`, `'${args.url}'`, ].filter(Boolean).join(" "); const result = await connector.executeShell(curlCmd, { timeout: (timeoutSecs + 10) * 1000, // shell timeout slightly longer than curl timeout cwd: config.samplesDir, }); if (result.exitCode !== 0) { // Clean up partial file left by curl try { await connector.execute(["rm", "-f", filePath], { timeout: 5000 }); } catch { /* ignore cleanup errors */ } const curlError = CURL_ERRORS[result.exitCode] || `curl failed with exit code ${result.exitCode}`; const stderr = result.stderr?.trim() || ""; return formatError("download_from_url", new REMnuxError( `Download failed: ${curlError}${stderr ? ` — ${stderr}` : ""}`, "DOWNLOAD_FAILED", "tool_failure", "Check that the URL is accessible and the server is responding", ), startTime); } // Gather file info (same pattern as get-file-info.ts) const info = await gatherFileInfo(connector, filePath, filename); return formatResponse("download_from_url", { method: "curl", url: args.url, ...info, }, startTime); } catch (error) { return formatError("download_from_url", toREMnuxError(error, deps.config.mode), startTime); } } - src/schemas/tools.ts:64-84 (schema)Zod schema definition for downloadFromUrlSchema with fields: url (required), filename (optional), headers (optional array), method (optional: 'curl' or 'thug'), overwrite (optional boolean), and timeout (optional number). Also exports DownloadFromUrlArgs type.
export const downloadFromUrlSchema = z.object({ url: z.string().url().describe("URL to download (http or https only)"), filename: z.string().optional().describe( "Override filename in samples dir. If omitted, derived from URL path." ), headers: z.array(z.string()).optional().describe( "Custom HTTP headers as 'Name: value' strings. " + "Example: ['User-Agent: Mozilla/5.0', 'X-Auth-Token: abc123']" ), method: z.enum(["curl", "thug"]).optional().default("curl").describe( "Download method. 'curl' (default) for direct HTTP download. " + "'thug' for sites requiring JavaScript execution (uses thug honeyclient)." ), overwrite: z.boolean().optional().default(false).describe( "Whether to overwrite if file exists. Default: false" ), timeout: z.number().optional().describe( "Download timeout in seconds (default: server timeout)" ), }); export type DownloadFromUrlArgs = z.input<typeof downloadFromUrlSchema>; - src/index.ts:158-166 (registration)Tool registration in the MCP server where download_from_url is registered with description, schema shape, and handler function mapping to handleDownloadFromUrl.
// Tool: download_from_url - Download a file from a URL into samples server.tool( "download_from_url", "Download a file from a URL into the samples directory for analysis. " + "Returns file metadata (hashes, type, size). Supports custom HTTP headers " + "and an optional thug mode for sites requiring JavaScript execution.", downloadFromUrlSchema.shape, (args) => handleDownloadFromUrl(deps, args) );