fetch_with_license
Determine the license of any image or webpage URL by analyzing host heuristics and metadata, enabling go/no-go decisions before use.
Instructions
Given an arbitrary URL (image or webpage), determine its license via host heuristics + page metadata (, dc.rights, og tags). Set probe: true to also download the bytes. Use when an agent already has a URL and needs a go/no-go decision before shipping.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| probe | No |
Implementation Reference
- The core implementation of fetchWithLicense. Takes a URL, probes if it's an image (using HEAD request), then either returns license heuristics for images (with optional download via probe:true) or parses HTML page metadata for license info.
export async function fetchWithLicense( url: string, opts: FetchWithLicenseOptions = {}, ): Promise<FetchWithLicenseResult> { const fetcher = opts.fetcher ?? fetch; const ua = opts.userAgent ?? "webfetch-mcp/0.1"; const publicUrl = assertPublicHttpUrl(url); if (!publicUrl.ok) throw new Error(publicUrl.error); // Probe the URL shallowly to see if it's an image. let head: Response; try { head = await fetcher(url, { method: "HEAD", headers: { "User-Agent": ua }, signal: opts.signal, }); } catch { // Some servers disallow HEAD; fall through to GET. head = new Response(null); } const ct = (head.headers.get("content-type") ?? "").split(";")[0]!.trim(); if (ct.startsWith("image/")) { const heur = heuristicLicenseFromUrl(url); const res: FetchWithLicenseResult = { license: heur.license, confidence: heur.confidence, sourcePageUrl: url, attributionLine: buildAttribution({ license: heur.license, sourceUrl: url }), }; if (opts.probe) { const dl = await downloadImage(url, { fetcher, userAgent: ua, signal: opts.signal }); res.bytes = dl.bytes; res.mime = dl.mime; res.sha256 = dl.sha256; res.cachedPath = dl.cachedPath; try { const meta = await readImageMetadata(dl.bytes); res.embeddedMetadata = meta; // Reconcile — embedded metadata wins on tie. if (meta.artist && !res.author) res.author = meta.artist; if (meta.copyright) res.copyright = meta.copyright; if (meta.license !== "UNKNOWN" && meta.confidence.license >= res.confidence) { res.license = meta.license; res.confidence = Math.max(res.confidence, meta.confidence.license); if (meta.licenseUrl) res.licenseUrl = meta.licenseUrl; } res.attributionLine = buildAttribution({ license: res.license, author: res.author, sourceUrl: res.sourcePageUrl, }); } catch { // non-fatal — keep host-heuristic result } } return res; } // Treat as webpage: fetch HTML and extract hints. const resp = await fetcher(url, { headers: { "User-Agent": ua, Accept: "text/html" }, signal: opts.signal, }); if (!resp.ok) { return { license: "UNKNOWN", confidence: 0, sourcePageUrl: url }; } const html = await resp.text(); return parseHtmlLicense(html, url); } - parseHtmlLicense helper function that extracts license metadata from HTML using regex matching for <link rel=license>, dc.rights, og:image:license, and article:author tags.
export function parseHtmlLicense(html: string, url: string): FetchWithLicenseResult { const metaLicense = matchAttr(html, /<link[^>]+rel=["']license["'][^>]+href=["']([^"']+)["']/i) ?? matchAttr(html, /<meta[^>]+name=["']dc.rights["'][^>]+content=["']([^"']+)["']/i) ?? matchAttr(html, /<meta[^>]+property=["']og:image:license["'][^>]+content=["']([^"']+)["']/i); const ogAuthor = matchAttr( html, /<meta[^>]+property=["']article:author["'][^>]+content=["']([^"']+)["']/i, ); const heur = heuristicLicenseFromUrl(url); let license: License = heur.license; let confidence = heur.confidence; if (metaLicense) { const coerced = coerceLicense(metaLicense); if (coerced !== "UNKNOWN") { license = coerced; confidence = Math.max(confidence, 0.7); } } return { license, confidence, sourcePageUrl: url, author: ogAuthor ?? undefined, attributionLine: buildAttribution({ license, author: ogAuthor ?? undefined, sourceUrl: url, }), }; } - packages/mcp/src/schema.ts:49-55 (schema)Zod input schema for the fetch_with_license MCP tool. Takes a url (string) and optional probe (boolean, default false) to also download bytes.
export const fetchWithLicenseSchema = z.object({ url: z.string().url(), probe: z .boolean() .default(false) .describe("When true, also download the bytes if this URL is an image"), }); - packages/mcp/src/tools.ts:85-104 (registration)MCP tool registration for 'fetch_with_license'. Defines name, description (prompt surface for LLM), inputSchema, and handler that calls core fetchWithLicense and renders the result as JSON.
{ name: "fetch_with_license", description: "Given an arbitrary URL (image or webpage), determine its license via host heuristics + page metadata (<link rel=license>, dc.rights, og tags). Set probe: true to also download the bytes. Use when an agent already has a URL and needs a go/no-go decision before shipping.", inputSchema: fetchWithLicenseSchema, async handler(args) { const r = await fetchWithLicense(args.url, { probe: args.probe }); return renderJson({ license: r.license, confidence: r.confidence, author: r.author, attributionLine: r.attributionLine, sourcePageUrl: r.sourcePageUrl, mime: r.mime, sha256: r.sha256, cachedPath: r.cachedPath, byteSize: r.bytes?.byteLength, }); }, }, - packages/mcp/src/index.ts:40-57 (registration)MCP server handler that receives CallToolRequest, parses input via schema, and dispatches to the tool's handler. This is the entry point that wires the tool to the MCP protocol.
server.setRequestHandler(CallToolRequestSchema, async (req) => { const tool = TOOLS.find((t) => t.name === req.params.name); if (!tool) { return { content: [{ type: "text", text: `unknown tool: ${req.params.name}` }], isError: true }; } try { const parsed = tool.inputSchema.parse(req.params.arguments ?? {}); const out = await tool.handler(parsed); return out as any; } catch (e) { const msg = (e as Error).message ?? "unknown error"; return { content: [{ type: "text", text: `error: ${msg}` }], isError: true }; } }); const transport = new StdioServerTransport(); await server.connect(transport);