xpathwithurl
Extract specific data from XML/HTML content using XPath queries. Provide a URL and XPath expression to fetch and filter content from web pages or documents.
Instructions
Fetch content from a URL and select query it using XPath
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| mimeType | No | The MIME type (e.g. text/xml, application/xml, text/html, application/xhtml+xml) | text/html |
| query | Yes | The XPath query to execute | |
| url | Yes | The URL to fetch XML/HTML content from |
Implementation Reference
- index.ts:168-213 (handler)Handler for 'xpathwithurl' tool: parses arguments, uses Puppeteer to fetch and render page content from URL, parses as XML/HTML, executes XPath query using xpath library, handles errors and empty results, serializes output using resultToString.} else if (name === "xpathwithurl") { const { url, query, mimeType } = XPathWithUrlArgumentsSchema.parse(args); // Launch puppeteer browser const browser = await puppeteer.launch({ headless: true }); const page = await browser.newPage(); try { // Navigate to the URL and wait until network is idle await page.goto(url, { waitUntil: 'networkidle0' }); // Get the rendered HTML const xml = await page.content(); // Parse XML const parsedXml = parser.parseFromString(xml, mimeType); // Check for parsing errors const errors = xpath.select('//parsererror', parsedXml); if (Array.isArray(errors) && errors.length > 0) { return { content: [{ type: "text", text: "XML parsing error: " + resultToString(errors[0]) }] }; } const result = xpath.select(query, parsedXml); // If result is an empty array, provide more information if (Array.isArray(result) && result.length === 0) { return { content: [{ type: "text", text: "No nodes matched the query." }] }; } return { content: [{ type: "text", text: resultToString(result) }] }; } catch (error: unknown) { const errorMessage = error instanceof Error ? error.message : String(error); return { content: [{ type: "text", text: `Error processing XPath query: ${errorMessage}` }] }; } finally { // Make sure to close the browser await browser.close(); }
- index.ts:24-30 (schema)Zod schema for validating input arguments to the xpathwithurl tool: url (required string URL), query (required string), mimeType (optional string, default 'text/html').const XPathWithUrlArgumentsSchema = z.object({ url: z.string().url().describe("The URL to fetch XML/HTML content from"), query: z.string().describe("The XPath query to execute"), mimeType: z.string() .describe("The MIME type (e.g. text/xml, application/xml, text/html, application/xhtml+xml)") .default("text/html") });
- index.ts:100-122 (registration)Tool registration in ListToolsRequestHandler: defines name, description, and JSON inputSchema matching the Zod schema.{ name: "xpathwithurl", description: "Fetch content from a URL and select query it using XPath", inputSchema: { type: "object", properties: { url: { type: "string", description: "The URL to fetch XML/HTML content from", }, query: { type: "string", description: "The XPath query to execute", }, mimeType: { type: "string", description: "The MIME type (e.g. text/xml, application/xml, text/html, application/xhtml+xml)", default: "text/html" } }, required: ["url", "query"], }, }
- index.ts:46-72 (helper)Utility function to convert XPath results (strings, numbers, booleans, DOM nodes/arrays) to a readable string format, used in both xpath and xpathwithurl handlers for output.function resultToString(result: string | number | boolean | Node | Node[] | null): string { if (result === null) { return "null"; } else if (Array.isArray(result)) { return result.map(resultToString).join("\n"); } else if (typeof result === 'object' && result.nodeType !== undefined) { // Handle DOM nodes if (result.nodeType === 1) { // Element node const serializer = new XMLSerializer(); return serializer.serializeToString(result); } else if (result.nodeType === 2) { // Attribute node return `${result.nodeName}="${result.nodeValue}"`; } else if (result.nodeType === 3) { // Text node return result.nodeValue || ""; } else { // Default fallback for other node types try { const serializer = new XMLSerializer(); return serializer.serializeToString(result); } catch (e) { return String(result); } } } else { return String(result); } }