download_from_url
Download files from URLs for malware analysis, supporting custom headers and JavaScript-enabled sites when needed. Returns file metadata including hashes and type.
Instructions
Download a file from a URL into the samples directory for analysis. Returns file metadata (hashes, type, size). Supports custom HTTP headers and an optional thug mode for sites requiring JavaScript execution.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to download (http or https only) | |
| filename | No | Override filename in samples dir. If omitted, derived from URL path. | |
| headers | No | Custom HTTP headers as 'Name: value' strings. Example: ['User-Agent: Mozilla/5.0', 'X-Auth-Token: abc123'] | |
| method | No | Download method. 'curl' (default) for direct HTTP download. 'thug' for sites requiring JavaScript execution (uses thug honeyclient). | curl |
| overwrite | No | Whether to overwrite if file exists. Default: false | |
| timeout | No | Download timeout in seconds (default: server timeout) |
Implementation Reference
- src/handlers/download-from-url.ts:95-219 (handler)Main handler function handleDownloadFromUrl that implements the download_from_url tool. Validates URL/headers/filename, executes download via curl or thug, gathers file metadata, and returns formatted response. Includes helper functions validateUrl (33-51), validateHeader (57-69), deriveFilename (75-93), handleThugDownload (224-314), and gatherFileInfo (319-365).export async function handleDownloadFromUrl( deps: HandlerDeps, args: DownloadFromUrlArgs, ) { const startTime = Date.now(); const { connector, config } = deps; const method = args.method ?? "curl"; try { // Validate URL const urlValidation = validateUrl(args.url); if (!urlValidation.valid) { return formatError("download_from_url", new REMnuxError( urlValidation.error!, "INVALID_URL", "validation", "Provide a valid http:// or https:// URL", ), startTime); } // Validate headers if provided const headers = args.headers ?? []; for (const h of headers) { const hv = validateHeader(h); if (!hv.valid) { return formatError("download_from_url", new REMnuxError( hv.error!, "INVALID_HEADER", "validation", "Headers must be 'Name: value' format without control characters or single quotes", ), startTime); } } // Determine filename const filename = args.filename ?? deriveFilename(args.url); const fnValidation = validateFilename(filename); if (!fnValidation.valid) { return formatError("download_from_url", new REMnuxError( fnValidation.error || "Invalid filename", "INVALID_FILENAME", "validation", "Use alphanumeric characters, hyphens, underscores, and dots only", ), startTime); } const filePath = `${config.samplesDir}/${filename}`; // Check if file exists (unless overwrite) if (!args.overwrite) { try { const checkResult = await connector.execute(["test", "-e", filePath], { timeout: 5000 }); if (checkResult.exitCode === 0) { return formatError("download_from_url", new REMnuxError( `File already exists: ${filename}. Use overwrite=true to replace.`, "FILE_EXISTS", "validation", "Set overwrite=true or choose a different filename", ), startTime); } } catch { // test command failed — file doesn't exist, proceed } } // Ensure samples dir exists try { await connector.execute(["mkdir", "-p", config.samplesDir], { timeout: 5000 }); } catch { // ignore — real errors surface below } if (method === "thug") { return await handleThugDownload(deps, args, headers, filename, filePath, startTime); } // ── Curl path ────────────────────────────────────────────────────────── const timeoutSecs = args.timeout ?? config.timeout; // Build curl command const headerFlags = headers.map(h => `-H '${h}'`).join(" "); const curlCmd = [ "curl -sSfL", `--max-filesize ${MAX_FILESIZE}`, `--max-redirs ${MAX_REDIRS}`, `--max-time ${timeoutSecs}`, headerFlags, `-o '${filePath}'`, `'${args.url}'`, ].filter(Boolean).join(" "); const result = await connector.executeShell(curlCmd, { timeout: (timeoutSecs + 10) * 1000, // shell timeout slightly longer than curl timeout cwd: config.samplesDir, }); if (result.exitCode !== 0) { // Clean up partial file left by curl try { await connector.execute(["rm", "-f", filePath], { timeout: 5000 }); } catch { /* ignore cleanup errors */ } const curlError = CURL_ERRORS[result.exitCode] || `curl failed with exit code ${result.exitCode}`; const stderr = result.stderr?.trim() || ""; return formatError("download_from_url", new REMnuxError( `Download failed: ${curlError}${stderr ? ` — ${stderr}` : ""}`, "DOWNLOAD_FAILED", "tool_failure", "Check that the URL is accessible and the server is responding", ), startTime); } // Gather file info (same pattern as get-file-info.ts) const info = await gatherFileInfo(connector, filePath, filename); return formatResponse("download_from_url", { method: "curl", url: args.url, ...info, }, startTime); } catch (error) { return formatError("download_from_url", toREMnuxError(error, deps.config.mode), startTime); } }
- src/schemas/tools.ts:64-84 (schema)Zod schema definition for downloadFromUrlSchema with fields: url (required), filename (optional), headers (optional array), method (optional: 'curl' or 'thug'), overwrite (optional boolean), and timeout (optional number). Also exports DownloadFromUrlArgs type.export const downloadFromUrlSchema = z.object({ url: z.string().url().describe("URL to download (http or https only)"), filename: z.string().optional().describe( "Override filename in samples dir. If omitted, derived from URL path." ), headers: z.array(z.string()).optional().describe( "Custom HTTP headers as 'Name: value' strings. " + "Example: ['User-Agent: Mozilla/5.0', 'X-Auth-Token: abc123']" ), method: z.enum(["curl", "thug"]).optional().default("curl").describe( "Download method. 'curl' (default) for direct HTTP download. " + "'thug' for sites requiring JavaScript execution (uses thug honeyclient)." ), overwrite: z.boolean().optional().default(false).describe( "Whether to overwrite if file exists. Default: false" ), timeout: z.number().optional().describe( "Download timeout in seconds (default: server timeout)" ), }); export type DownloadFromUrlArgs = z.input<typeof downloadFromUrlSchema>;
- src/index.ts:158-166 (registration)Tool registration in the MCP server where download_from_url is registered with description, schema shape, and handler function mapping to handleDownloadFromUrl.// Tool: download_from_url - Download a file from a URL into samples server.tool( "download_from_url", "Download a file from a URL into the samples directory for analysis. " + "Returns file metadata (hashes, type, size). Supports custom HTTP headers " + "and an optional thug mode for sites requiring JavaScript execution.", downloadFromUrlSchema.shape, (args) => handleDownloadFromUrl(deps, args) );