Skip to main content
Glama

Simple Document Processing MCP Server

html_extract_resources

Extract images, videos, and links from HTML files and save them to a specified directory, enabling efficient resource management for document processing tasks.

Instructions

Extract all resources (images, videos, links) from HTML

Input Schema

NameRequiredDescriptionDefault
inputPathYesPath to the input HTML file
outputDirYesDirectory where resources should be saved

Input Schema (JSON Schema)

{ "properties": { "inputPath": { "description": "Path to the input HTML file", "type": "string" }, "outputDir": { "description": "Directory where resources should be saved", "type": "string" } }, "required": [ "inputPath", "outputDir" ], "type": "object" }

Implementation Reference

  • The core handler function that parses HTML with JSDOM, extracts images, links, and videos, and saves them as a JSON list to the output directory.
    export async function extractHtmlResources( inputPath: string, outputDir: string ) { try { console.error(`Starting resource extraction...`); console.error(`Input file: ${inputPath}`); console.error(`Output directory: ${outputDir}`); // 確保輸出目錄存在 try { await fs.access(outputDir); console.error(`Output directory exists: ${outputDir}`); } catch { console.error(`Creating output directory: ${outputDir}`); await fs.mkdir(outputDir, { recursive: true }); console.error(`Created output directory: ${outputDir}`); } const uniqueId = generateUniqueId(); const htmlContent = await fs.readFile(inputPath, "utf-8"); const dom = new JSDOM(htmlContent); const { document } = dom.window; // 提取資源 const resources = { images: Array.from(document.querySelectorAll("img")).map( (img) => (img as HTMLImageElement).src ), links: Array.from(document.querySelectorAll("a")).map( (a) => (a as HTMLAnchorElement).href ), videos: Array.from(document.querySelectorAll("video source")).map( (video) => (video as HTMLSourceElement).src ), }; const outputPath = path.join(outputDir, `resources_${uniqueId}.json`); await fs.writeFile(outputPath, JSON.stringify(resources, null, 2)); console.error(`Written resources to ${outputPath}`); return { success: true, data: `Successfully extracted resources: ${outputPath}`, }; } catch (error) { console.error(`Error in extractHtmlResources:`, error); return { success: false, error: error instanceof Error ? error.message : "Unknown error", }; } }
  • The MCP server request handler dispatch that invokes the extractHtmlResources function and formats the response.
    if (name === "html_extract_resources") { const { inputPath, outputDir } = args as { inputPath: string; outputDir: string; }; const result = await extractHtmlResources(inputPath, outputDir); if (!result.success) { return { content: [{ type: "text", text: `Error: ${result.error}` }], isError: true, }; } return { content: [{ type: "text", text: fileOperationResponse(result.data) }], isError: false, }; }
  • The Tool object definition with name, description, and inputSchema for validation.
    export const HTML_EXTRACT_RESOURCES_TOOL: Tool = { name: "html_extract_resources", description: "Extract all resources (images, videos, links) from HTML", inputSchema: { type: "object", properties: { inputPath: { type: "string", description: "Path to the input HTML file", }, outputDir: { type: "string", description: "Directory where resources should be saved", }, }, required: ["inputPath", "outputDir"], }, };
  • Registration of the tool in the central tools array exported for listTools handler.
    export const tools = [DOCUMENT_READER_TOOL, PDF_MERGE_TOOL, PDF_SPLIT_TOOL, DOCX_TO_PDF_TOOL, DOCX_TO_HTML_TOOL, HTML_CLEAN_TOOL, HTML_TO_TEXT_TOOL, HTML_TO_MARKDOWN_TOOL, HTML_EXTRACT_RESOURCES_TOOL, HTML_FORMAT_TOOL, TEXT_DIFF_TOOL, TEXT_SPLIT_TOOL, TEXT_FORMAT_TOOL, TEXT_ENCODING_CONVERT_TOOL, EXCEL_READ_TOOL, FORMAT_CONVERTER_TOOL];

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cablate/mcp-doc-forge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server