Verify Citations
verify_citationsExtract citations from agent output, fetch cited sources, and verify with an LLM judge whether each source supports the claim in context. Returns per-citation verdicts and overall support ratio.
Instructions
Extract citations from agent output, fetch the cited sources, and use an LLM judge to check whether each source supports the claim in context. Returns per-citation verdicts + an overall support ratio.
Sibling tools — evaluate_with_llm_judge runs general semantic scoring (accuracy, helpfulness, correctness, faithfulness); this tool is specifically for citation grounding (does the cited source actually support the claim). evaluate_output's no_hallucination_markers heuristic detects FABRICATED-looking citations cheaply (free, no fetch); this tool resolves and verifies them (paid, opt-in fetch, SSRF-guarded). log_trace / get_traces handle trace I/O. verify_citations is the GROUNDING-CHECK path — narrowest in scope, deepest in rigor.
Behavior. Three-phase pipeline: (1) regex extraction of [N] numbered refs, (Author, Year) parentheticals, bare URLs, and DOIs (in-process, no network); (2) SSRF-guarded fetch of URL + DOI citations, with scheme allowlist, private/link-local/cloud-metadata IP blocking, optional domain allowlist (IRIS_CITATION_DOMAINS), 10s timeout, 5MB body cap, manual redirect chase (max 3, re-checked), in-process LRU cache; (3) per-citation LLM judge call asking "does this source support this claim?" with a 256-token verdict. Opt-in via allow_fetch=true or IRIS_CITATION_ALLOW_FETCH=1 — Iris refuses outbound HTTP by default. Cost-capped across the entire call by max_cost_usd_total (default $1.00) — the pipeline stops when the cap would be exceeded. Rate-limited to 20 req/min on HTTP MCP. Writes one eval_result row tagged with per-citation provenance.
Output shape. Returns JSON: { "id": "<uuid>", "overall_score": 0..1|null, "passed": boolean, "total_citations_found": number, "total_resolved": number, "total_supported": number, "total_cost_usd": number, "citations": [{ "citation": { "raw", "kind", "identifier", "offset_start", "offset_end" }, "resolve_status": "ok"|"skipped"|"error", "resolve_error"?, "source"?: { "url", "status", "content_type", "bytes_fetched", "truncated" }, "judge"?: { "supported", "confidence", "rationale", "cost_usd", "latency_ms", "input_tokens", "output_tokens" } }] }. overall_score = supported / resolved; null when nothing resolvable was found.
Use when the output makes factual claims backed by [1]-style references, DOIs, or URLs and you want to separate "cited correctly" from "cited and wrong" from "cited but unresolvable". Particularly useful for research/legal/medical agents where fabricated citations are the dominant failure mode.
Don't use when the agent output has no citations at all (overall_score will be null; the tool degrades gracefully but a heuristic rule is cheaper). Don't use without allow_fetch=true or IRIS_CITATION_ALLOW_FETCH=1 — the tool refuses outbound HTTP unless explicitly enabled. Don't use with an open allowlist + untrusted output on the public internet; you are effectively running a user-directed fetcher. For stricter safety set IRIS_CITATION_DOMAINS to a curated list.
Parameters. model is required; provider auto-detected from model name (override only for ambiguous IDs). allow_fetch=false by default — outbound HTTP is REFUSED unless explicitly true OR IRIS_CITATION_ALLOW_FETCH=1 env. domain_allowlist suffix-matches hostnames (e.g., "wikipedia.org" allows en.wikipedia.org); merged with IRIS_CITATION_DOMAINS env (UNION — either source permits). max_citations defaults 20, hard cap 50 (extras are skipped silently, NOT errored — check total_citations_found in the response if precise). max_cost_usd_total defaults $1.00 — the pipeline stops mid-citation when the next judge call would exceed the cap (returns partial verdicts). per_source_timeout_ms defaults 10000 (10s); per_source_max_bytes defaults 5MB (truncates at boundary, judges still run on truncated content). trace_id optional but recommended. Defaults: max_citations=20, max_cost_usd_total=$1.00, per_source_timeout_ms=10000, per_source_max_bytes=5242880, allow_fetch=false.
Error modes. Throws when the API key env var is missing. Throws "Unknown model" on unsupported model IDs. Per-citation errors are collected (resolve_error.kind = bad_scheme / ssrf / not_allowed_domain / timeout / too_large / bad_status / redirect_loop / not_text / fetch_disabled / malformed_judge_response / cost_cap_reached / unresolvable_kind) and returned in the response rather than thrown. An empty output or output with zero extractable citations returns overall_score=null + passed=true (nothing to fail).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| output | Yes | The agent output containing citations to verify | |
| model | Yes | Judge model for per-citation verification. Supported: anthropic = claude-opus-4-7 | claude-sonnet-4-6 | claude-haiku-4-5-20251001; openai = gpt-4o | gpt-4o-mini | o1-mini. | |
| provider | No | Auto-detected from model when omitted | |
| allow_fetch | No | Permit outbound HTTP to resolve URLs/DOIs. Defaults to IRIS_CITATION_ALLOW_FETCH=1; false otherwise. SSRF-guarded regardless. | |
| domain_allowlist | No | Restrict fetches to hostnames in this list (suffix match allowed). Merged with IRIS_CITATION_DOMAINS env. | |
| max_cost_usd_total | No | Cap TOTAL judge cost across all citations in this call; default $1.00 | |
| max_citations | No | Max citations to verify (extras skipped); default 20 | |
| per_source_timeout_ms | No | Per-URL fetch timeout; default 10_000 | |
| per_source_max_bytes | No | Per-URL body cap; default 5MB | |
| trace_id | No | Link verification result to a trace |
Implementation Reference
- src/tools/index.ts:13-30 (registration)Imports and registers the verify_citations tool via registerVerifyCitationsTool(server, storage) alongside all other tools.
import { registerVerifyCitationsTool } from './verify-citations.js'; export function registerAllTools( server: McpServer, storage: IStorageAdapter, evalEngine: EvalEngine, customRuleStore: CustomRuleStore, ): void { registerLogTraceTool(server, storage); registerEvaluateOutputTool(server, storage, evalEngine); registerGetTracesTool(server, storage); registerListRulesTool(server, customRuleStore); registerDeployRuleTool(server, customRuleStore); registerDeleteRuleTool(server, customRuleStore); registerDeleteTraceTool(server, storage); registerEvaluateWithLLMJudgeTool(server, storage); registerVerifyCitationsTool(server, storage); } - src/tools/verify-citations.ts:10-28 (schema)Input Zod schema for verify_citations: output (string), model (string), provider (optional enum), allow_fetch, domain_allowlist, max_cost_usd_total, max_citations, per_source_timeout_ms, per_source_max_bytes, trace_id.
const inputSchema = { output: z.string().min(1).describe('The agent output containing citations to verify'), model: z .string() .describe( 'Judge model for per-citation verification. Supported: anthropic = claude-opus-4-7 | claude-sonnet-4-6 | claude-haiku-4-5-20251001; openai = gpt-4o | gpt-4o-mini | o1-mini.', ), provider: z.enum(['anthropic', 'openai']).optional().describe('Auto-detected from model when omitted'), allow_fetch: z.boolean().optional().describe('Permit outbound HTTP to resolve URLs/DOIs. Defaults to IRIS_CITATION_ALLOW_FETCH=1; false otherwise. SSRF-guarded regardless.'), domain_allowlist: z .array(z.string()) .optional() .describe('Restrict fetches to hostnames in this list (suffix match allowed). Merged with IRIS_CITATION_DOMAINS env.'), max_cost_usd_total: z.number().positive().optional().describe('Cap TOTAL judge cost across all citations in this call; default $1.00'), max_citations: z.number().int().positive().max(50).optional().describe('Max citations to verify (extras skipped); default 20'), per_source_timeout_ms: z.number().int().positive().optional().describe('Per-URL fetch timeout; default 10_000'), per_source_max_bytes: z.number().int().positive().optional().describe('Per-URL body cap; default 5MB'), trace_id: z.string().optional().describe('Link verification result to a trace'), }; - src/tools/verify-citations.ts:64-184 (handler)MCP tool registration handler: registerVerifyCitationsTool registers 'verify_citations' on the server with a full description, input schema, and async callback that resolves provider/apiKey/allowFetch/domainAllowlist, calls verifyCitations(), persists the eval result, and returns the JSON response.
export function registerVerifyCitationsTool(server: McpServer, storage: IStorageAdapter): void { server.registerTool( 'verify_citations', { title: 'Verify Citations', description: [ 'Extract citations from agent output, fetch the cited sources, and use an LLM judge to check whether each source supports the claim in context. Returns per-citation verdicts + an overall support ratio.', '', 'Sibling tools — evaluate_with_llm_judge runs general semantic scoring (accuracy, helpfulness, correctness, faithfulness); this tool is specifically for citation grounding (does the cited source actually support the claim). evaluate_output\'s no_hallucination_markers heuristic detects FABRICATED-looking citations cheaply (free, no fetch); this tool resolves and verifies them (paid, opt-in fetch, SSRF-guarded). log_trace / get_traces handle trace I/O. verify_citations is the GROUNDING-CHECK path — narrowest in scope, deepest in rigor.', '', 'Behavior. Three-phase pipeline: (1) regex extraction of [N] numbered refs, (Author, Year) parentheticals, bare URLs, and DOIs (in-process, no network); (2) SSRF-guarded fetch of URL + DOI citations, with scheme allowlist, private/link-local/cloud-metadata IP blocking, optional domain allowlist (IRIS_CITATION_DOMAINS), 10s timeout, 5MB body cap, manual redirect chase (max 3, re-checked), in-process LRU cache; (3) per-citation LLM judge call asking "does this source support this claim?" with a 256-token verdict. Opt-in via allow_fetch=true or IRIS_CITATION_ALLOW_FETCH=1 — Iris refuses outbound HTTP by default. Cost-capped across the entire call by max_cost_usd_total (default $1.00) — the pipeline stops when the cap would be exceeded. Rate-limited to 20 req/min on HTTP MCP. Writes one eval_result row tagged with per-citation provenance.', '', 'Output shape. Returns JSON: `{ "id": "<uuid>", "overall_score": 0..1|null, "passed": boolean, "total_citations_found": number, "total_resolved": number, "total_supported": number, "total_cost_usd": number, "citations": [{ "citation": { "raw", "kind", "identifier", "offset_start", "offset_end" }, "resolve_status": "ok"|"skipped"|"error", "resolve_error"?, "source"?: { "url", "status", "content_type", "bytes_fetched", "truncated" }, "judge"?: { "supported", "confidence", "rationale", "cost_usd", "latency_ms", "input_tokens", "output_tokens" } }] }`. `overall_score = supported / resolved`; `null` when nothing resolvable was found.', '', 'Use when the output makes factual claims backed by [1]-style references, DOIs, or URLs and you want to separate "cited correctly" from "cited and wrong" from "cited but unresolvable". Particularly useful for research/legal/medical agents where fabricated citations are the dominant failure mode.', "", "Don't use when the agent output has no citations at all (overall_score will be null; the tool degrades gracefully but a heuristic rule is cheaper). Don't use without allow_fetch=true or IRIS_CITATION_ALLOW_FETCH=1 — the tool refuses outbound HTTP unless explicitly enabled. Don't use with an open allowlist + untrusted output on the public internet; you are effectively running a user-directed fetcher. For stricter safety set IRIS_CITATION_DOMAINS to a curated list.", '', 'Parameters. model is required; provider auto-detected from model name (override only for ambiguous IDs). allow_fetch=false by default — outbound HTTP is REFUSED unless explicitly true OR IRIS_CITATION_ALLOW_FETCH=1 env. domain_allowlist suffix-matches hostnames (e.g., "wikipedia.org" allows en.wikipedia.org); merged with IRIS_CITATION_DOMAINS env (UNION — either source permits). max_citations defaults 20, hard cap 50 (extras are skipped silently, NOT errored — check total_citations_found in the response if precise). max_cost_usd_total defaults $1.00 — the pipeline stops mid-citation when the next judge call would exceed the cap (returns partial verdicts). per_source_timeout_ms defaults 10000 (10s); per_source_max_bytes defaults 5MB (truncates at boundary, judges still run on truncated content). trace_id optional but recommended. Defaults: max_citations=20, max_cost_usd_total=$1.00, per_source_timeout_ms=10000, per_source_max_bytes=5242880, allow_fetch=false.', '', 'Error modes. Throws when the API key env var is missing. Throws "Unknown model" on unsupported model IDs. Per-citation errors are collected (resolve_error.kind = bad_scheme / ssrf / not_allowed_domain / timeout / too_large / bad_status / redirect_loop / not_text / fetch_disabled / malformed_judge_response / cost_cap_reached / unresolvable_kind) and returned in the response rather than thrown. An empty output or output with zero extractable citations returns overall_score=null + passed=true (nothing to fail).', ].join('\n'), inputSchema, annotations: { readOnlyHint: false, // Writes eval_result + spends money destructiveHint: false, // Creates data; doesn't overwrite/delete idempotentHint: false, // External fetches + provider non-determinism openWorldHint: true, // Outbound HTTP to citation URLs + LLM provider API }, }, async (args) => { const provider = (args.provider as LLMProvider | undefined) ?? inferProvider(args.model); const apiKey = resolveApiKey(provider); const allowFetch = resolveAllowFetch(args.allow_fetch); const domainAllowlist = resolveDomainAllowlist(args.domain_allowlist); const result = await verifyCitations({ output: args.output, provider, model: args.model, apiKey, allowFetch, domainAllowlist, maxCostUsdTotal: args.max_cost_usd_total, maxCitations: args.max_citations, perSourceTimeoutMs: args.per_source_timeout_ms, perSourceMaxBytes: args.per_source_max_bytes, }); const evalId = generateEvalId(); const score = result.overallScore ?? 0; // Persist so dashboard can surface. eval_type='custom' — same // rationale as evaluate_with_llm_judge (spans all 4 heuristic // categories). rule_results[0] carries per-citation summary. await storage.insertEvalResult(LOCAL_TENANT, { id: evalId, trace_id: args.trace_id, eval_type: 'custom', output_text: args.output, score, passed: result.passed, rule_results: [ { ruleName: `semantic_citation_verify:${provider}/${args.model}`, passed: result.passed, score, message: result.overallScore === null ? `No resolvable citations (found ${result.totalCitationsFound}, resolved ${result.totalResolved})` : `${result.totalSupported}/${result.totalResolved} cited sources supported the output`, }, ], suggestions: result.passed ? [] : [`Only ${result.totalSupported}/${result.totalResolved} cited sources actually supported the claim.`], rules_evaluated: 1, rules_skipped: 0, insufficient_data: result.overallScore === null, }); return { content: [ { type: 'text' as const, text: JSON.stringify({ id: evalId, overall_score: result.overallScore, passed: result.passed, total_citations_found: result.totalCitationsFound, total_resolved: result.totalResolved, total_supported: result.totalSupported, total_cost_usd: result.totalCostUsd, citations: result.citations.map((c) => ({ citation: { raw: c.citation.raw, kind: c.citation.kind, identifier: c.citation.identifier, offset_start: c.citation.offsetStart, offset_end: c.citation.offsetEnd, }, resolve_status: c.resolveStatus, resolve_error: c.resolveError, source: c.source, judge: c.judge ? { supported: c.judge.supported, confidence: c.judge.confidence, rationale: c.judge.rationale, cost_usd: c.judge.costUsd, latency_ms: c.judge.latencyMs, input_tokens: c.judge.inputTokens, output_tokens: c.judge.outputTokens, } : undefined, })), }), }, ], }; }, ); } - Core verification logic: VerifyCitationsParams/Result types, system prompt for LLM judge, buildUser prompt builder, parseJudgeResult parser, and the verifyCitations() async function that extracts citations, resolves URL/DOI sources, calls LLM judge per citation with cost capping, and aggregates overallScore/passed.
export interface VerifyCitationsParams { output: string; provider: LLMProvider; model: string; apiKey: string; allowFetch: boolean; domainAllowlist?: readonly string[]; maxCostUsdTotal?: number; perSourceTimeoutMs?: number; perSourceMaxBytes?: number; // Cap number of citations we attempt — protects against DoS-by-spam. maxCitations?: number; } export interface VerifiedCitation { citation: ExtractedCitation; resolveStatus: 'ok' | 'skipped' | 'error'; resolveError?: { kind: string; message: string }; source?: Pick<ResolvedSource, 'url' | 'status' | 'contentType' | 'bytesFetched' | 'truncated'>; // LLM judge verdict — only set when resolve succeeded. judge?: { supported: boolean; confidence: number; rationale: string; inputTokens: number; outputTokens: number; costUsd: number | null; latencyMs: number; }; } export interface VerifyCitationsResult { // Aggregate — fraction of resolvable citations judged supported. // Null when there were zero citations or zero resolvable ones. overallScore: number | null; passed: boolean; // Per-citation detail for the dashboard. citations: VerifiedCitation[]; // Accumulated cost across all LLM calls we made. totalCostUsd: number; totalCitationsFound: number; totalResolved: number; totalSupported: number; } const SYSTEM = `You are a citation verification evaluator. Given a claim extracted from AI-generated output and the text of a cited source, decide whether the source supports the claim. Score 0.00 means the source contradicts the claim or does not mention it. Score 1.00 means the source clearly supports the claim. Be strict: do not rate as supported unless the source actually contains the assertion. Do NOT penalize paraphrasing; DO penalize invented specifics not in the source. Respond with a single JSON object — no markdown, no prose: { "supported": <boolean>, "confidence": <number 0.00..1.00>, "rationale": "<1-2 sentences — quote 5-15 words from the source if you found support>" }`; function buildUser(claim: string, sourceText: string): string { // Truncate huge sources so we stay within reasonable tokens. const maxSourceChars = 12_000; // ~3k tokens const trimmed = sourceText.length > maxSourceChars ? sourceText.slice(0, maxSourceChars) + '\n\n[…source truncated…]' : sourceText; return `CLAIM:\n${claim}\n\nSOURCE TEXT:\n${trimmed}`; } function parseJudgeResult(raw: string): { supported: boolean; confidence: number; rationale: string; } { const trimmed = raw .trim() .replace(/^```json\s*/i, '') .replace(/^```\s*/i, '') .replace(/```$/i, '') .trim(); const first = trimmed.indexOf('{'); const last = trimmed.lastIndexOf('}'); if (first < 0 || last <= first) { throw new LLMJudgeError( `Citation judge did not emit JSON: ${raw.slice(0, 200)}`, 'malformed_response', ); } const obj = JSON.parse(trimmed.slice(first, last + 1)) as Record<string, unknown>; const supported = obj.supported === true; const confRaw = obj.confidence; const confidence = typeof confRaw === 'number' ? confRaw : Number(confRaw); if (!Number.isFinite(confidence) || confidence < 0 || confidence > 1) { throw new LLMJudgeError( `Citation judge confidence out of [0..1]: ${String(confRaw)}`, 'malformed_response', ); } const rationale = typeof obj.rationale === 'string' ? obj.rationale : ''; return { supported, confidence: Math.round(confidence * 100) / 100, rationale }; } export async function verifyCitations( params: VerifyCitationsParams, ): Promise<VerifyCitationsResult> { const citations = extractCitations(params.output); const maxCitations = params.maxCitations ?? 20; const selected = citations.slice(0, maxCitations); const totalFound = citations.length; if (!findPricing(params.model)) { throw new Error( `Unknown model "${params.model}". Add pricing to src/eval/llm-judge/pricing.ts first.`, ); } const maxCostTotal = params.maxCostUsdTotal ?? 1.0; const out: VerifiedCitation[] = []; let totalCost = 0; let totalResolved = 0; let totalSupported = 0; for (const citation of selected) { // Only URL/DOI can be resolved. Numbered citations without // footnote definitions + author-year refs without a bibliography // are flagged as unresolvable — the output claims them but we have // nothing to compare against. if (citation.kind !== 'url' && citation.kind !== 'doi') { out.push({ citation, resolveStatus: 'skipped', resolveError: { kind: 'unresolvable_kind', message: `No source to fetch for ${citation.kind} citation` } }); continue; } let source: ResolvedSource; try { source = await resolveSource(citation.identifier, { allowFetch: params.allowFetch, timeoutMs: params.perSourceTimeoutMs, maxBytes: params.perSourceMaxBytes, domainAllowlist: params.domainAllowlist, }); totalResolved++; } catch (err) { const e = err as CitationResolveError; out.push({ citation, resolveStatus: 'error', resolveError: { kind: e.kind ?? 'unknown', message: e.message }, }); continue; } // Before calling the judge: would this blow our total cost? // Use the same pessimistic estimate as the main LLM judge evaluator. const contextLen = citation.contextWindow.length + source.text.length; const pessimistic = estimateCostUsd(params.model, Math.ceil(contextLen / 4), 512) ?? 0; if (totalCost + pessimistic > maxCostTotal) { out.push({ citation, resolveStatus: 'ok', source: { url: source.url, status: source.status, contentType: source.contentType, bytesFetched: source.bytesFetched, truncated: source.truncated, }, resolveError: { kind: 'cost_cap_reached', message: `Total cost cap $${maxCostTotal.toFixed(2)} would be exceeded by next judge call`, }, }); break; // No point continuing — subsequent calls will also exceed. } let judgeResponse; try { judgeResponse = await callLLMJudge({ provider: params.provider, model: params.model, systemPrompt: SYSTEM, userPrompt: buildUser(citation.contextWindow, source.text), maxOutputTokens: 256, temperature: 0, apiKey: params.apiKey, }); } catch (err) { const e = err as Error; out.push({ citation, resolveStatus: 'ok', source: { url: source.url, status: source.status, contentType: source.contentType, bytesFetched: source.bytesFetched, truncated: source.truncated, }, resolveError: { kind: err instanceof LLMJudgeError ? err.kind : 'llm_judge_error', message: e.message, }, }); continue; } const cost = estimateCostUsd(params.model, judgeResponse.inputTokens, judgeResponse.outputTokens); totalCost += cost ?? 0; let parsed; try { parsed = parseJudgeResult(judgeResponse.content); } catch (err) { const e = err as Error; out.push({ citation, resolveStatus: 'ok', source: { url: source.url, status: source.status, contentType: source.contentType, bytesFetched: source.bytesFetched, truncated: source.truncated, }, resolveError: { kind: 'malformed_judge_response', message: e.message, }, }); continue; } if (parsed.supported) totalSupported++; out.push({ citation, resolveStatus: 'ok', source: { url: source.url, status: source.status, contentType: source.contentType, bytesFetched: source.bytesFetched, truncated: source.truncated, }, judge: { supported: parsed.supported, confidence: parsed.confidence, rationale: parsed.rationale, inputTokens: judgeResponse.inputTokens, outputTokens: judgeResponse.outputTokens, costUsd: cost, latencyMs: judgeResponse.latencyMs, }, }); } const overallScore = totalResolved > 0 ? Math.round((totalSupported / totalResolved) * 100) / 100 : null; // Fail if >= 50% of resolved sources don't support the claim. When // no citations or none resolved, we don't fail — there's nothing to // score, we just report that. const passed = overallScore === null ? true : overallScore >= 0.5; return { overallScore, passed, citations: out, totalCostUsd: Math.round(totalCost * 1_000_000) / 1_000_000, totalCitationsFound: totalFound, totalResolved, totalSupported, }; } - Security-gated source resolver: resolves URLs/DOIs with SSRF protection (IP blocking, scheme allowlist, domain allowlist), fetch with redirects, streaming body cap, and LRU cache. resolveSource() is the main entry point used by verifier.ts.
// Citation source resolver — fetches URLs and DOIs so the verifier can // feed them to the LLM judge. This is the security-sensitive piece: we // are fetching URLs that appeared in model-generated output, which is // effectively user-controlled input. // // Defense layers (in order): // 1. Scheme allowlist — http/https only; refuse file:/javascript:/etc. // 2. SSRF host check — refuse localhost, link-local, private ranges, // and cloud metadata (AWS/GCP/Azure/DigitalOcean) IP literals. // 3. Optional domain allowlist — IRIS_CITATION_DOMAINS=doi.org,arxiv.org // restricts to a curated set; empty/unset = open web (still SSRF-guarded). // 4. Timeout + size cap — 10s default, 5MB cap on response body. // 5. Redirect chase cap — follow max 3 redirects, each re-checked. // 6. Cache — in-process LRU (100 entries) so retries don't re-fetch. // // This is opt-in: calls require passing {allowFetch: true} so an agent // can't trick Iris into fetching random URLs without operator consent // (consent granted via tool param or env IRIS_CITATION_ALLOW_FETCH=1). export interface ResolveOptions { allowFetch: boolean; timeoutMs?: number; maxBytes?: number; domainAllowlist?: readonly string[]; maxRedirects?: number; } export interface ResolvedSource { url: string; // final URL after redirects requestedUrl: string; // URL as passed in status: number; contentType: string; text: string; // truncated to maxBytes truncated: boolean; fetchedAt: string; // ISO bytesFetched: number; fromCache: boolean; } export class CitationResolveError extends Error { constructor( message: string, public readonly kind: | 'bad_scheme' | 'ssrf' | 'not_allowed_domain' | 'timeout' | 'too_large' | 'bad_status' | 'redirect_loop' | 'not_text' | 'fetch_disabled', public readonly details?: string, ) { super(message); this.name = 'CitationResolveError'; } } // Private IP ranges + localhost + link-local + cloud metadata. const BLOCKED_IPV4 = [ // Localhost /^127\./, // Link-local /^169\.254\./, // Private RFC 1918 /^10\./, /^192\.168\./, /^172\.(1[6-9]|2\d|3[01])\./, // Cloud metadata /^169\.254\.169\.254$/, // Broadcast /^255\.255\.255\.255$/, // This-network /^0\./, ]; const BLOCKED_IPV6 = [ /^::1$/, // localhost /^fc|^fd/i, // unique local /^fe80/i, // link-local /^::ffff:127\./i, // IPv4-mapped localhost ]; const BLOCKED_HOST_SUBSTRINGS = ['localhost', 'internal', '.local', 'metadata.google', 'metadata.azure']; function isIpv4(host: string): boolean { return /^\d{1,3}(\.\d{1,3}){3}$/.test(host); } function isIpv6(host: string): boolean { return host.includes(':'); } export function isSafeHost(host: string): boolean { const hostLower = host.toLowerCase(); for (const sub of BLOCKED_HOST_SUBSTRINGS) { if (hostLower === sub || hostLower.endsWith(sub)) return false; } if (isIpv4(host)) { for (const re of BLOCKED_IPV4) { if (re.test(host)) return false; } } if (isIpv6(host)) { for (const re of BLOCKED_IPV6) { if (re.test(host)) return false; } } return true; } function matchesAllowlist(host: string, allowlist: readonly string[] | undefined): boolean { if (!allowlist || allowlist.length === 0) return true; const hostLower = host.toLowerCase(); for (const allowed of allowlist) { const a = allowed.toLowerCase(); if (hostLower === a || hostLower.endsWith('.' + a)) return true; } return false; } // Tiny LRU — short-circuits duplicate fetches in a single batch of // citations (e.g. 5 citations in one output all pointing to the same // source). Not durable; every process start is fresh. const CACHE_MAX = 100; const cache = new Map<string, ResolvedSource>(); function cacheGet(key: string): ResolvedSource | undefined { const hit = cache.get(key); if (!hit) return undefined; // Re-insert to bump LRU order cache.delete(key); cache.set(key, hit); return { ...hit, fromCache: true }; } function cacheSet(key: string, value: ResolvedSource): void { if (cache.size >= CACHE_MAX) { const firstKey = cache.keys().next().value as string | undefined; if (firstKey !== undefined) cache.delete(firstKey); } cache.set(key, value); } export function __clearCitationCacheForTests(): void { cache.clear(); } function normalizeDoiToUrl(doiOrUrl: string): string { const trimmed = doiOrUrl.trim(); if (/^https?:\/\//i.test(trimmed)) return trimmed; if (/^10\.\d{4,9}\//.test(trimmed)) return `https://doi.org/${trimmed}`; return trimmed; } async function doFetch(url: string, opts: ResolveOptions, redirectsLeft: number): Promise<ResolvedSource> { const timeoutMs = opts.timeoutMs ?? 10_000; const maxBytes = opts.maxBytes ?? 5 * 1024 * 1024; const parsed = new URL(url); if (parsed.protocol !== 'https:' && parsed.protocol !== 'http:') { throw new CitationResolveError( `Refusing non-http(s) scheme: ${parsed.protocol}`, 'bad_scheme', parsed.protocol, ); } if (!isSafeHost(parsed.hostname)) { throw new CitationResolveError(`Refusing SSRF-blocked host: ${parsed.hostname}`, 'ssrf', parsed.hostname); } if (!matchesAllowlist(parsed.hostname, opts.domainAllowlist)) { throw new CitationResolveError( `Host ${parsed.hostname} not in IRIS_CITATION_DOMAINS allowlist`, 'not_allowed_domain', parsed.hostname, ); } const controller = new AbortController(); const timer = setTimeout(() => controller.abort(), timeoutMs); let res; try { res = await fetch(url, { signal: controller.signal, redirect: 'manual', headers: { 'user-agent': 'iris-mcp-citation-verifier/0.4 (+https://iris-eval.com)', accept: 'text/html, text/plain, application/pdf, application/xhtml+xml, */*;q=0.1', }, }); } catch (err) { if (err instanceof Error && err.name === 'AbortError') { throw new CitationResolveError(`Fetch timed out after ${timeoutMs}ms`, 'timeout'); } throw err; } finally { clearTimeout(timer); } // Manual redirect chase — we re-check each target against SSRF rules. if (res.status >= 300 && res.status < 400) { const loc = res.headers.get('location'); if (!loc) { throw new CitationResolveError(`Redirect with no Location header (${res.status})`, 'bad_status'); } if (redirectsLeft <= 0) { throw new CitationResolveError(`Exceeded max redirects (${opts.maxRedirects ?? 3})`, 'redirect_loop'); } const next = new URL(loc, url).toString(); return doFetch(next, opts, redirectsLeft - 1); } if (res.status < 200 || res.status >= 300) { throw new CitationResolveError(`Bad status ${res.status}`, 'bad_status', String(res.status)); } const contentType = (res.headers.get('content-type') ?? '').toLowerCase(); // We only extract text. PDFs could be supported later with pdf-parse // but that's an opt-in + heavy dep. For now we refuse non-text. const textLike = contentType.includes('text/') || contentType.includes('xml') || contentType.includes('json') || contentType === ''; if (!textLike) { throw new CitationResolveError( `Refusing non-text content-type: ${contentType}`, 'not_text', contentType, ); } // Stream with a byte cap so we don't DoS ourselves on a huge body. if (!res.body) { throw new CitationResolveError('Response had no body stream', 'bad_status'); } const reader = res.body.getReader(); const chunks: Uint8Array[] = []; let total = 0; let truncated = false; for (;;) { const { value, done } = await reader.read(); if (done) break; total += value.byteLength; if (total > maxBytes) { truncated = true; chunks.push(value.slice(0, Math.max(0, maxBytes - (total - value.byteLength)))); await reader.cancel(); break; } chunks.push(value); } // Concat + decode. UTF-8 is good enough for this use case — any weird // encoding will surface as mojibake for the LLM judge, still safe. const buf = new Uint8Array(total > maxBytes ? maxBytes : total); let off = 0; for (const c of chunks) { buf.set(c, off); off += c.byteLength; } const text = new TextDecoder('utf-8', { fatal: false }).decode(buf); return { url: res.url || url, requestedUrl: url, status: res.status, contentType, text, truncated, fetchedAt: new Date().toISOString(), bytesFetched: Math.min(total, maxBytes), fromCache: false, }; } export async function resolveSource( identifier: string, opts: ResolveOptions, ): Promise<ResolvedSource> { if (!opts.allowFetch) { throw new CitationResolveError( 'Citation fetch is disabled. Pass allowFetch:true or set IRIS_CITATION_ALLOW_FETCH=1.', 'fetch_disabled', ); } const url = normalizeDoiToUrl(identifier); const cacheKey = url; const cached = cacheGet(cacheKey); if (cached) return cached; const fetched = await doFetch(url, opts, opts.maxRedirects ?? 3); cacheSet(cacheKey, fetched); return fetched; }