Skip to main content
Glama

Verify Citations

verify_citations

Extract citations from agent output, fetch cited sources, and verify with an LLM judge whether each source supports the claim in context. Returns per-citation verdicts and overall support ratio.

Instructions

Extract citations from agent output, fetch the cited sources, and use an LLM judge to check whether each source supports the claim in context. Returns per-citation verdicts + an overall support ratio.

Sibling tools — evaluate_with_llm_judge runs general semantic scoring (accuracy, helpfulness, correctness, faithfulness); this tool is specifically for citation grounding (does the cited source actually support the claim). evaluate_output's no_hallucination_markers heuristic detects FABRICATED-looking citations cheaply (free, no fetch); this tool resolves and verifies them (paid, opt-in fetch, SSRF-guarded). log_trace / get_traces handle trace I/O. verify_citations is the GROUNDING-CHECK path — narrowest in scope, deepest in rigor.

Behavior. Three-phase pipeline: (1) regex extraction of [N] numbered refs, (Author, Year) parentheticals, bare URLs, and DOIs (in-process, no network); (2) SSRF-guarded fetch of URL + DOI citations, with scheme allowlist, private/link-local/cloud-metadata IP blocking, optional domain allowlist (IRIS_CITATION_DOMAINS), 10s timeout, 5MB body cap, manual redirect chase (max 3, re-checked), in-process LRU cache; (3) per-citation LLM judge call asking "does this source support this claim?" with a 256-token verdict. Opt-in via allow_fetch=true or IRIS_CITATION_ALLOW_FETCH=1 — Iris refuses outbound HTTP by default. Cost-capped across the entire call by max_cost_usd_total (default $1.00) — the pipeline stops when the cap would be exceeded. Rate-limited to 20 req/min on HTTP MCP. Writes one eval_result row tagged with per-citation provenance.

Output shape. Returns JSON: { "id": "<uuid>", "overall_score": 0..1|null, "passed": boolean, "total_citations_found": number, "total_resolved": number, "total_supported": number, "total_cost_usd": number, "citations": [{ "citation": { "raw", "kind", "identifier", "offset_start", "offset_end" }, "resolve_status": "ok"|"skipped"|"error", "resolve_error"?, "source"?: { "url", "status", "content_type", "bytes_fetched", "truncated" }, "judge"?: { "supported", "confidence", "rationale", "cost_usd", "latency_ms", "input_tokens", "output_tokens" } }] }. overall_score = supported / resolved; null when nothing resolvable was found.

Use when the output makes factual claims backed by [1]-style references, DOIs, or URLs and you want to separate "cited correctly" from "cited and wrong" from "cited but unresolvable". Particularly useful for research/legal/medical agents where fabricated citations are the dominant failure mode.

Don't use when the agent output has no citations at all (overall_score will be null; the tool degrades gracefully but a heuristic rule is cheaper). Don't use without allow_fetch=true or IRIS_CITATION_ALLOW_FETCH=1 — the tool refuses outbound HTTP unless explicitly enabled. Don't use with an open allowlist + untrusted output on the public internet; you are effectively running a user-directed fetcher. For stricter safety set IRIS_CITATION_DOMAINS to a curated list.

Parameters. model is required; provider auto-detected from model name (override only for ambiguous IDs). allow_fetch=false by default — outbound HTTP is REFUSED unless explicitly true OR IRIS_CITATION_ALLOW_FETCH=1 env. domain_allowlist suffix-matches hostnames (e.g., "wikipedia.org" allows en.wikipedia.org); merged with IRIS_CITATION_DOMAINS env (UNION — either source permits). max_citations defaults 20, hard cap 50 (extras are skipped silently, NOT errored — check total_citations_found in the response if precise). max_cost_usd_total defaults $1.00 — the pipeline stops mid-citation when the next judge call would exceed the cap (returns partial verdicts). per_source_timeout_ms defaults 10000 (10s); per_source_max_bytes defaults 5MB (truncates at boundary, judges still run on truncated content). trace_id optional but recommended. Defaults: max_citations=20, max_cost_usd_total=$1.00, per_source_timeout_ms=10000, per_source_max_bytes=5242880, allow_fetch=false.

Error modes. Throws when the API key env var is missing. Throws "Unknown model" on unsupported model IDs. Per-citation errors are collected (resolve_error.kind = bad_scheme / ssrf / not_allowed_domain / timeout / too_large / bad_status / redirect_loop / not_text / fetch_disabled / malformed_judge_response / cost_cap_reached / unresolvable_kind) and returned in the response rather than thrown. An empty output or output with zero extractable citations returns overall_score=null + passed=true (nothing to fail).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
outputYesThe agent output containing citations to verify
modelYesJudge model for per-citation verification. Supported: anthropic = claude-opus-4-7 | claude-sonnet-4-6 | claude-haiku-4-5-20251001; openai = gpt-4o | gpt-4o-mini | o1-mini.
providerNoAuto-detected from model when omitted
allow_fetchNoPermit outbound HTTP to resolve URLs/DOIs. Defaults to IRIS_CITATION_ALLOW_FETCH=1; false otherwise. SSRF-guarded regardless.
domain_allowlistNoRestrict fetches to hostnames in this list (suffix match allowed). Merged with IRIS_CITATION_DOMAINS env.
max_cost_usd_totalNoCap TOTAL judge cost across all citations in this call; default $1.00
max_citationsNoMax citations to verify (extras skipped); default 20
per_source_timeout_msNoPer-URL fetch timeout; default 10_000
per_source_max_bytesNoPer-URL body cap; default 5MB
trace_idNoLink verification result to a trace

Implementation Reference

  • Imports and registers the verify_citations tool via registerVerifyCitationsTool(server, storage) alongside all other tools.
    import { registerVerifyCitationsTool } from './verify-citations.js';
    
    export function registerAllTools(
      server: McpServer,
      storage: IStorageAdapter,
      evalEngine: EvalEngine,
      customRuleStore: CustomRuleStore,
    ): void {
      registerLogTraceTool(server, storage);
      registerEvaluateOutputTool(server, storage, evalEngine);
      registerGetTracesTool(server, storage);
      registerListRulesTool(server, customRuleStore);
      registerDeployRuleTool(server, customRuleStore);
      registerDeleteRuleTool(server, customRuleStore);
      registerDeleteTraceTool(server, storage);
      registerEvaluateWithLLMJudgeTool(server, storage);
      registerVerifyCitationsTool(server, storage);
    }
  • Input Zod schema for verify_citations: output (string), model (string), provider (optional enum), allow_fetch, domain_allowlist, max_cost_usd_total, max_citations, per_source_timeout_ms, per_source_max_bytes, trace_id.
    const inputSchema = {
      output: z.string().min(1).describe('The agent output containing citations to verify'),
      model: z
        .string()
        .describe(
          'Judge model for per-citation verification. Supported: anthropic = claude-opus-4-7 | claude-sonnet-4-6 | claude-haiku-4-5-20251001; openai = gpt-4o | gpt-4o-mini | o1-mini.',
        ),
      provider: z.enum(['anthropic', 'openai']).optional().describe('Auto-detected from model when omitted'),
      allow_fetch: z.boolean().optional().describe('Permit outbound HTTP to resolve URLs/DOIs. Defaults to IRIS_CITATION_ALLOW_FETCH=1; false otherwise. SSRF-guarded regardless.'),
      domain_allowlist: z
        .array(z.string())
        .optional()
        .describe('Restrict fetches to hostnames in this list (suffix match allowed). Merged with IRIS_CITATION_DOMAINS env.'),
      max_cost_usd_total: z.number().positive().optional().describe('Cap TOTAL judge cost across all citations in this call; default $1.00'),
      max_citations: z.number().int().positive().max(50).optional().describe('Max citations to verify (extras skipped); default 20'),
      per_source_timeout_ms: z.number().int().positive().optional().describe('Per-URL fetch timeout; default 10_000'),
      per_source_max_bytes: z.number().int().positive().optional().describe('Per-URL body cap; default 5MB'),
      trace_id: z.string().optional().describe('Link verification result to a trace'),
    };
  • MCP tool registration handler: registerVerifyCitationsTool registers 'verify_citations' on the server with a full description, input schema, and async callback that resolves provider/apiKey/allowFetch/domainAllowlist, calls verifyCitations(), persists the eval result, and returns the JSON response.
    export function registerVerifyCitationsTool(server: McpServer, storage: IStorageAdapter): void {
      server.registerTool(
        'verify_citations',
        {
          title: 'Verify Citations',
          description: [
            'Extract citations from agent output, fetch the cited sources, and use an LLM judge to check whether each source supports the claim in context. Returns per-citation verdicts + an overall support ratio.',
            '',
            'Sibling tools — evaluate_with_llm_judge runs general semantic scoring (accuracy, helpfulness, correctness, faithfulness); this tool is specifically for citation grounding (does the cited source actually support the claim). evaluate_output\'s no_hallucination_markers heuristic detects FABRICATED-looking citations cheaply (free, no fetch); this tool resolves and verifies them (paid, opt-in fetch, SSRF-guarded). log_trace / get_traces handle trace I/O. verify_citations is the GROUNDING-CHECK path — narrowest in scope, deepest in rigor.',
            '',
            'Behavior. Three-phase pipeline: (1) regex extraction of [N] numbered refs, (Author, Year) parentheticals, bare URLs, and DOIs (in-process, no network); (2) SSRF-guarded fetch of URL + DOI citations, with scheme allowlist, private/link-local/cloud-metadata IP blocking, optional domain allowlist (IRIS_CITATION_DOMAINS), 10s timeout, 5MB body cap, manual redirect chase (max 3, re-checked), in-process LRU cache; (3) per-citation LLM judge call asking "does this source support this claim?" with a 256-token verdict. Opt-in via allow_fetch=true or IRIS_CITATION_ALLOW_FETCH=1 — Iris refuses outbound HTTP by default. Cost-capped across the entire call by max_cost_usd_total (default $1.00) — the pipeline stops when the cap would be exceeded. Rate-limited to 20 req/min on HTTP MCP. Writes one eval_result row tagged with per-citation provenance.',
            '',
            'Output shape. Returns JSON: `{ "id": "<uuid>", "overall_score": 0..1|null, "passed": boolean, "total_citations_found": number, "total_resolved": number, "total_supported": number, "total_cost_usd": number, "citations": [{ "citation": { "raw", "kind", "identifier", "offset_start", "offset_end" }, "resolve_status": "ok"|"skipped"|"error", "resolve_error"?, "source"?: { "url", "status", "content_type", "bytes_fetched", "truncated" }, "judge"?: { "supported", "confidence", "rationale", "cost_usd", "latency_ms", "input_tokens", "output_tokens" } }] }`. `overall_score = supported / resolved`; `null` when nothing resolvable was found.',
            '',
            'Use when the output makes factual claims backed by [1]-style references, DOIs, or URLs and you want to separate "cited correctly" from "cited and wrong" from "cited but unresolvable". Particularly useful for research/legal/medical agents where fabricated citations are the dominant failure mode.',
            "",
            "Don't use when the agent output has no citations at all (overall_score will be null; the tool degrades gracefully but a heuristic rule is cheaper). Don't use without allow_fetch=true or IRIS_CITATION_ALLOW_FETCH=1 — the tool refuses outbound HTTP unless explicitly enabled. Don't use with an open allowlist + untrusted output on the public internet; you are effectively running a user-directed fetcher. For stricter safety set IRIS_CITATION_DOMAINS to a curated list.",
            '',
            'Parameters. model is required; provider auto-detected from model name (override only for ambiguous IDs). allow_fetch=false by default — outbound HTTP is REFUSED unless explicitly true OR IRIS_CITATION_ALLOW_FETCH=1 env. domain_allowlist suffix-matches hostnames (e.g., "wikipedia.org" allows en.wikipedia.org); merged with IRIS_CITATION_DOMAINS env (UNION — either source permits). max_citations defaults 20, hard cap 50 (extras are skipped silently, NOT errored — check total_citations_found in the response if precise). max_cost_usd_total defaults $1.00 — the pipeline stops mid-citation when the next judge call would exceed the cap (returns partial verdicts). per_source_timeout_ms defaults 10000 (10s); per_source_max_bytes defaults 5MB (truncates at boundary, judges still run on truncated content). trace_id optional but recommended. Defaults: max_citations=20, max_cost_usd_total=$1.00, per_source_timeout_ms=10000, per_source_max_bytes=5242880, allow_fetch=false.',
            '',
            'Error modes. Throws when the API key env var is missing. Throws "Unknown model" on unsupported model IDs. Per-citation errors are collected (resolve_error.kind = bad_scheme / ssrf / not_allowed_domain / timeout / too_large / bad_status / redirect_loop / not_text / fetch_disabled / malformed_judge_response / cost_cap_reached / unresolvable_kind) and returned in the response rather than thrown. An empty output or output with zero extractable citations returns overall_score=null + passed=true (nothing to fail).',
          ].join('\n'),
          inputSchema,
          annotations: {
            readOnlyHint: false,      // Writes eval_result + spends money
            destructiveHint: false,   // Creates data; doesn't overwrite/delete
            idempotentHint: false,    // External fetches + provider non-determinism
            openWorldHint: true,      // Outbound HTTP to citation URLs + LLM provider API
          },
        },
        async (args) => {
          const provider = (args.provider as LLMProvider | undefined) ?? inferProvider(args.model);
          const apiKey = resolveApiKey(provider);
          const allowFetch = resolveAllowFetch(args.allow_fetch);
          const domainAllowlist = resolveDomainAllowlist(args.domain_allowlist);
    
          const result = await verifyCitations({
            output: args.output,
            provider,
            model: args.model,
            apiKey,
            allowFetch,
            domainAllowlist,
            maxCostUsdTotal: args.max_cost_usd_total,
            maxCitations: args.max_citations,
            perSourceTimeoutMs: args.per_source_timeout_ms,
            perSourceMaxBytes: args.per_source_max_bytes,
          });
    
          const evalId = generateEvalId();
          const score = result.overallScore ?? 0;
    
          // Persist so dashboard can surface. eval_type='custom' — same
          // rationale as evaluate_with_llm_judge (spans all 4 heuristic
          // categories). rule_results[0] carries per-citation summary.
          await storage.insertEvalResult(LOCAL_TENANT, {
            id: evalId,
            trace_id: args.trace_id,
            eval_type: 'custom',
            output_text: args.output,
            score,
            passed: result.passed,
            rule_results: [
              {
                ruleName: `semantic_citation_verify:${provider}/${args.model}`,
                passed: result.passed,
                score,
                message:
                  result.overallScore === null
                    ? `No resolvable citations (found ${result.totalCitationsFound}, resolved ${result.totalResolved})`
                    : `${result.totalSupported}/${result.totalResolved} cited sources supported the output`,
              },
            ],
            suggestions: result.passed ? [] : [`Only ${result.totalSupported}/${result.totalResolved} cited sources actually supported the claim.`],
            rules_evaluated: 1,
            rules_skipped: 0,
            insufficient_data: result.overallScore === null,
          });
    
          return {
            content: [
              {
                type: 'text' as const,
                text: JSON.stringify({
                  id: evalId,
                  overall_score: result.overallScore,
                  passed: result.passed,
                  total_citations_found: result.totalCitationsFound,
                  total_resolved: result.totalResolved,
                  total_supported: result.totalSupported,
                  total_cost_usd: result.totalCostUsd,
                  citations: result.citations.map((c) => ({
                    citation: {
                      raw: c.citation.raw,
                      kind: c.citation.kind,
                      identifier: c.citation.identifier,
                      offset_start: c.citation.offsetStart,
                      offset_end: c.citation.offsetEnd,
                    },
                    resolve_status: c.resolveStatus,
                    resolve_error: c.resolveError,
                    source: c.source,
                    judge: c.judge
                      ? {
                          supported: c.judge.supported,
                          confidence: c.judge.confidence,
                          rationale: c.judge.rationale,
                          cost_usd: c.judge.costUsd,
                          latency_ms: c.judge.latencyMs,
                          input_tokens: c.judge.inputTokens,
                          output_tokens: c.judge.outputTokens,
                        }
                      : undefined,
                  })),
                }),
              },
            ],
          };
        },
      );
    }
  • Core verification logic: VerifyCitationsParams/Result types, system prompt for LLM judge, buildUser prompt builder, parseJudgeResult parser, and the verifyCitations() async function that extracts citations, resolves URL/DOI sources, calls LLM judge per citation with cost capping, and aggregates overallScore/passed.
    export interface VerifyCitationsParams {
      output: string;
      provider: LLMProvider;
      model: string;
      apiKey: string;
      allowFetch: boolean;
      domainAllowlist?: readonly string[];
      maxCostUsdTotal?: number;
      perSourceTimeoutMs?: number;
      perSourceMaxBytes?: number;
      // Cap number of citations we attempt — protects against DoS-by-spam.
      maxCitations?: number;
    }
    
    export interface VerifiedCitation {
      citation: ExtractedCitation;
      resolveStatus: 'ok' | 'skipped' | 'error';
      resolveError?: { kind: string; message: string };
      source?: Pick<ResolvedSource, 'url' | 'status' | 'contentType' | 'bytesFetched' | 'truncated'>;
      // LLM judge verdict — only set when resolve succeeded.
      judge?: {
        supported: boolean;
        confidence: number;
        rationale: string;
        inputTokens: number;
        outputTokens: number;
        costUsd: number | null;
        latencyMs: number;
      };
    }
    
    export interface VerifyCitationsResult {
      // Aggregate — fraction of resolvable citations judged supported.
      // Null when there were zero citations or zero resolvable ones.
      overallScore: number | null;
      passed: boolean;
      // Per-citation detail for the dashboard.
      citations: VerifiedCitation[];
      // Accumulated cost across all LLM calls we made.
      totalCostUsd: number;
      totalCitationsFound: number;
      totalResolved: number;
      totalSupported: number;
    }
    
    const SYSTEM = `You are a citation verification evaluator. Given a claim extracted from AI-generated output and the text of a cited source, decide whether the source supports the claim.
    
    Score 0.00 means the source contradicts the claim or does not mention it.
    Score 1.00 means the source clearly supports the claim.
    Be strict: do not rate as supported unless the source actually contains the assertion. Do NOT penalize paraphrasing; DO penalize invented specifics not in the source.
    
    Respond with a single JSON object — no markdown, no prose:
    {
      "supported": <boolean>,
      "confidence": <number 0.00..1.00>,
      "rationale": "<1-2 sentences — quote 5-15 words from the source if you found support>"
    }`;
    
    function buildUser(claim: string, sourceText: string): string {
      // Truncate huge sources so we stay within reasonable tokens.
      const maxSourceChars = 12_000; // ~3k tokens
      const trimmed =
        sourceText.length > maxSourceChars
          ? sourceText.slice(0, maxSourceChars) + '\n\n[…source truncated…]'
          : sourceText;
      return `CLAIM:\n${claim}\n\nSOURCE TEXT:\n${trimmed}`;
    }
    
    function parseJudgeResult(raw: string): {
      supported: boolean;
      confidence: number;
      rationale: string;
    } {
      const trimmed = raw
        .trim()
        .replace(/^```json\s*/i, '')
        .replace(/^```\s*/i, '')
        .replace(/```$/i, '')
        .trim();
      const first = trimmed.indexOf('{');
      const last = trimmed.lastIndexOf('}');
      if (first < 0 || last <= first) {
        throw new LLMJudgeError(
          `Citation judge did not emit JSON: ${raw.slice(0, 200)}`,
          'malformed_response',
        );
      }
      const obj = JSON.parse(trimmed.slice(first, last + 1)) as Record<string, unknown>;
      const supported = obj.supported === true;
      const confRaw = obj.confidence;
      const confidence = typeof confRaw === 'number' ? confRaw : Number(confRaw);
      if (!Number.isFinite(confidence) || confidence < 0 || confidence > 1) {
        throw new LLMJudgeError(
          `Citation judge confidence out of [0..1]: ${String(confRaw)}`,
          'malformed_response',
        );
      }
      const rationale = typeof obj.rationale === 'string' ? obj.rationale : '';
      return { supported, confidence: Math.round(confidence * 100) / 100, rationale };
    }
    
    export async function verifyCitations(
      params: VerifyCitationsParams,
    ): Promise<VerifyCitationsResult> {
      const citations = extractCitations(params.output);
      const maxCitations = params.maxCitations ?? 20;
      const selected = citations.slice(0, maxCitations);
      const totalFound = citations.length;
    
      if (!findPricing(params.model)) {
        throw new Error(
          `Unknown model "${params.model}". Add pricing to src/eval/llm-judge/pricing.ts first.`,
        );
      }
    
      const maxCostTotal = params.maxCostUsdTotal ?? 1.0;
    
      const out: VerifiedCitation[] = [];
      let totalCost = 0;
      let totalResolved = 0;
      let totalSupported = 0;
    
      for (const citation of selected) {
        // Only URL/DOI can be resolved. Numbered citations without
        // footnote definitions + author-year refs without a bibliography
        // are flagged as unresolvable — the output claims them but we have
        // nothing to compare against.
        if (citation.kind !== 'url' && citation.kind !== 'doi') {
          out.push({ citation, resolveStatus: 'skipped', resolveError: { kind: 'unresolvable_kind', message: `No source to fetch for ${citation.kind} citation` } });
          continue;
        }
    
        let source: ResolvedSource;
        try {
          source = await resolveSource(citation.identifier, {
            allowFetch: params.allowFetch,
            timeoutMs: params.perSourceTimeoutMs,
            maxBytes: params.perSourceMaxBytes,
            domainAllowlist: params.domainAllowlist,
          });
          totalResolved++;
        } catch (err) {
          const e = err as CitationResolveError;
          out.push({
            citation,
            resolveStatus: 'error',
            resolveError: { kind: e.kind ?? 'unknown', message: e.message },
          });
          continue;
        }
    
        // Before calling the judge: would this blow our total cost?
        // Use the same pessimistic estimate as the main LLM judge evaluator.
        const contextLen = citation.contextWindow.length + source.text.length;
        const pessimistic = estimateCostUsd(params.model, Math.ceil(contextLen / 4), 512) ?? 0;
        if (totalCost + pessimistic > maxCostTotal) {
          out.push({
            citation,
            resolveStatus: 'ok',
            source: {
              url: source.url,
              status: source.status,
              contentType: source.contentType,
              bytesFetched: source.bytesFetched,
              truncated: source.truncated,
            },
            resolveError: {
              kind: 'cost_cap_reached',
              message: `Total cost cap $${maxCostTotal.toFixed(2)} would be exceeded by next judge call`,
            },
          });
          break; // No point continuing — subsequent calls will also exceed.
        }
    
        let judgeResponse;
        try {
          judgeResponse = await callLLMJudge({
            provider: params.provider,
            model: params.model,
            systemPrompt: SYSTEM,
            userPrompt: buildUser(citation.contextWindow, source.text),
            maxOutputTokens: 256,
            temperature: 0,
            apiKey: params.apiKey,
          });
        } catch (err) {
          const e = err as Error;
          out.push({
            citation,
            resolveStatus: 'ok',
            source: {
              url: source.url,
              status: source.status,
              contentType: source.contentType,
              bytesFetched: source.bytesFetched,
              truncated: source.truncated,
            },
            resolveError: {
              kind: err instanceof LLMJudgeError ? err.kind : 'llm_judge_error',
              message: e.message,
            },
          });
          continue;
        }
    
        const cost = estimateCostUsd(params.model, judgeResponse.inputTokens, judgeResponse.outputTokens);
        totalCost += cost ?? 0;
    
        let parsed;
        try {
          parsed = parseJudgeResult(judgeResponse.content);
        } catch (err) {
          const e = err as Error;
          out.push({
            citation,
            resolveStatus: 'ok',
            source: {
              url: source.url,
              status: source.status,
              contentType: source.contentType,
              bytesFetched: source.bytesFetched,
              truncated: source.truncated,
            },
            resolveError: {
              kind: 'malformed_judge_response',
              message: e.message,
            },
          });
          continue;
        }
    
        if (parsed.supported) totalSupported++;
    
        out.push({
          citation,
          resolveStatus: 'ok',
          source: {
            url: source.url,
            status: source.status,
            contentType: source.contentType,
            bytesFetched: source.bytesFetched,
            truncated: source.truncated,
          },
          judge: {
            supported: parsed.supported,
            confidence: parsed.confidence,
            rationale: parsed.rationale,
            inputTokens: judgeResponse.inputTokens,
            outputTokens: judgeResponse.outputTokens,
            costUsd: cost,
            latencyMs: judgeResponse.latencyMs,
          },
        });
      }
    
      const overallScore = totalResolved > 0 ? Math.round((totalSupported / totalResolved) * 100) / 100 : null;
      // Fail if >= 50% of resolved sources don't support the claim. When
      // no citations or none resolved, we don't fail — there's nothing to
      // score, we just report that.
      const passed = overallScore === null ? true : overallScore >= 0.5;
    
      return {
        overallScore,
        passed,
        citations: out,
        totalCostUsd: Math.round(totalCost * 1_000_000) / 1_000_000,
        totalCitationsFound: totalFound,
        totalResolved,
        totalSupported,
      };
    }
  • Security-gated source resolver: resolves URLs/DOIs with SSRF protection (IP blocking, scheme allowlist, domain allowlist), fetch with redirects, streaming body cap, and LRU cache. resolveSource() is the main entry point used by verifier.ts.
    // Citation source resolver — fetches URLs and DOIs so the verifier can
    // feed them to the LLM judge. This is the security-sensitive piece: we
    // are fetching URLs that appeared in model-generated output, which is
    // effectively user-controlled input.
    //
    // Defense layers (in order):
    //   1. Scheme allowlist — http/https only; refuse file:/javascript:/etc.
    //   2. SSRF host check — refuse localhost, link-local, private ranges,
    //      and cloud metadata (AWS/GCP/Azure/DigitalOcean) IP literals.
    //   3. Optional domain allowlist — IRIS_CITATION_DOMAINS=doi.org,arxiv.org
    //      restricts to a curated set; empty/unset = open web (still SSRF-guarded).
    //   4. Timeout + size cap — 10s default, 5MB cap on response body.
    //   5. Redirect chase cap — follow max 3 redirects, each re-checked.
    //   6. Cache — in-process LRU (100 entries) so retries don't re-fetch.
    //
    // This is opt-in: calls require passing {allowFetch: true} so an agent
    // can't trick Iris into fetching random URLs without operator consent
    // (consent granted via tool param or env IRIS_CITATION_ALLOW_FETCH=1).
    
    export interface ResolveOptions {
      allowFetch: boolean;
      timeoutMs?: number;
      maxBytes?: number;
      domainAllowlist?: readonly string[];
      maxRedirects?: number;
    }
    
    export interface ResolvedSource {
      url: string;             // final URL after redirects
      requestedUrl: string;    // URL as passed in
      status: number;
      contentType: string;
      text: string;            // truncated to maxBytes
      truncated: boolean;
      fetchedAt: string;       // ISO
      bytesFetched: number;
      fromCache: boolean;
    }
    
    export class CitationResolveError extends Error {
      constructor(
        message: string,
        public readonly kind:
          | 'bad_scheme'
          | 'ssrf'
          | 'not_allowed_domain'
          | 'timeout'
          | 'too_large'
          | 'bad_status'
          | 'redirect_loop'
          | 'not_text'
          | 'fetch_disabled',
        public readonly details?: string,
      ) {
        super(message);
        this.name = 'CitationResolveError';
      }
    }
    
    // Private IP ranges + localhost + link-local + cloud metadata.
    const BLOCKED_IPV4 = [
      // Localhost
      /^127\./,
      // Link-local
      /^169\.254\./,
      // Private RFC 1918
      /^10\./,
      /^192\.168\./,
      /^172\.(1[6-9]|2\d|3[01])\./,
      // Cloud metadata
      /^169\.254\.169\.254$/,
      // Broadcast
      /^255\.255\.255\.255$/,
      // This-network
      /^0\./,
    ];
    
    const BLOCKED_IPV6 = [
      /^::1$/, // localhost
      /^fc|^fd/i, // unique local
      /^fe80/i, // link-local
      /^::ffff:127\./i, // IPv4-mapped localhost
    ];
    
    const BLOCKED_HOST_SUBSTRINGS = ['localhost', 'internal', '.local', 'metadata.google', 'metadata.azure'];
    
    function isIpv4(host: string): boolean {
      return /^\d{1,3}(\.\d{1,3}){3}$/.test(host);
    }
    
    function isIpv6(host: string): boolean {
      return host.includes(':');
    }
    
    export function isSafeHost(host: string): boolean {
      const hostLower = host.toLowerCase();
      for (const sub of BLOCKED_HOST_SUBSTRINGS) {
        if (hostLower === sub || hostLower.endsWith(sub)) return false;
      }
      if (isIpv4(host)) {
        for (const re of BLOCKED_IPV4) {
          if (re.test(host)) return false;
        }
      }
      if (isIpv6(host)) {
        for (const re of BLOCKED_IPV6) {
          if (re.test(host)) return false;
        }
      }
      return true;
    }
    
    function matchesAllowlist(host: string, allowlist: readonly string[] | undefined): boolean {
      if (!allowlist || allowlist.length === 0) return true;
      const hostLower = host.toLowerCase();
      for (const allowed of allowlist) {
        const a = allowed.toLowerCase();
        if (hostLower === a || hostLower.endsWith('.' + a)) return true;
      }
      return false;
    }
    
    // Tiny LRU — short-circuits duplicate fetches in a single batch of
    // citations (e.g. 5 citations in one output all pointing to the same
    // source). Not durable; every process start is fresh.
    const CACHE_MAX = 100;
    const cache = new Map<string, ResolvedSource>();
    
    function cacheGet(key: string): ResolvedSource | undefined {
      const hit = cache.get(key);
      if (!hit) return undefined;
      // Re-insert to bump LRU order
      cache.delete(key);
      cache.set(key, hit);
      return { ...hit, fromCache: true };
    }
    
    function cacheSet(key: string, value: ResolvedSource): void {
      if (cache.size >= CACHE_MAX) {
        const firstKey = cache.keys().next().value as string | undefined;
        if (firstKey !== undefined) cache.delete(firstKey);
      }
      cache.set(key, value);
    }
    
    export function __clearCitationCacheForTests(): void {
      cache.clear();
    }
    
    function normalizeDoiToUrl(doiOrUrl: string): string {
      const trimmed = doiOrUrl.trim();
      if (/^https?:\/\//i.test(trimmed)) return trimmed;
      if (/^10\.\d{4,9}\//.test(trimmed)) return `https://doi.org/${trimmed}`;
      return trimmed;
    }
    
    async function doFetch(url: string, opts: ResolveOptions, redirectsLeft: number): Promise<ResolvedSource> {
      const timeoutMs = opts.timeoutMs ?? 10_000;
      const maxBytes = opts.maxBytes ?? 5 * 1024 * 1024;
    
      const parsed = new URL(url);
      if (parsed.protocol !== 'https:' && parsed.protocol !== 'http:') {
        throw new CitationResolveError(
          `Refusing non-http(s) scheme: ${parsed.protocol}`,
          'bad_scheme',
          parsed.protocol,
        );
      }
      if (!isSafeHost(parsed.hostname)) {
        throw new CitationResolveError(`Refusing SSRF-blocked host: ${parsed.hostname}`, 'ssrf', parsed.hostname);
      }
      if (!matchesAllowlist(parsed.hostname, opts.domainAllowlist)) {
        throw new CitationResolveError(
          `Host ${parsed.hostname} not in IRIS_CITATION_DOMAINS allowlist`,
          'not_allowed_domain',
          parsed.hostname,
        );
      }
    
      const controller = new AbortController();
      const timer = setTimeout(() => controller.abort(), timeoutMs);
      let res;
      try {
        res = await fetch(url, {
          signal: controller.signal,
          redirect: 'manual',
          headers: {
            'user-agent': 'iris-mcp-citation-verifier/0.4 (+https://iris-eval.com)',
            accept: 'text/html, text/plain, application/pdf, application/xhtml+xml, */*;q=0.1',
          },
        });
      } catch (err) {
        if (err instanceof Error && err.name === 'AbortError') {
          throw new CitationResolveError(`Fetch timed out after ${timeoutMs}ms`, 'timeout');
        }
        throw err;
      } finally {
        clearTimeout(timer);
      }
    
      // Manual redirect chase — we re-check each target against SSRF rules.
      if (res.status >= 300 && res.status < 400) {
        const loc = res.headers.get('location');
        if (!loc) {
          throw new CitationResolveError(`Redirect with no Location header (${res.status})`, 'bad_status');
        }
        if (redirectsLeft <= 0) {
          throw new CitationResolveError(`Exceeded max redirects (${opts.maxRedirects ?? 3})`, 'redirect_loop');
        }
        const next = new URL(loc, url).toString();
        return doFetch(next, opts, redirectsLeft - 1);
      }
    
      if (res.status < 200 || res.status >= 300) {
        throw new CitationResolveError(`Bad status ${res.status}`, 'bad_status', String(res.status));
      }
    
      const contentType = (res.headers.get('content-type') ?? '').toLowerCase();
      // We only extract text. PDFs could be supported later with pdf-parse
      // but that's an opt-in + heavy dep. For now we refuse non-text.
      const textLike =
        contentType.includes('text/') ||
        contentType.includes('xml') ||
        contentType.includes('json') ||
        contentType === '';
      if (!textLike) {
        throw new CitationResolveError(
          `Refusing non-text content-type: ${contentType}`,
          'not_text',
          contentType,
        );
      }
    
      // Stream with a byte cap so we don't DoS ourselves on a huge body.
      if (!res.body) {
        throw new CitationResolveError('Response had no body stream', 'bad_status');
      }
      const reader = res.body.getReader();
      const chunks: Uint8Array[] = [];
      let total = 0;
      let truncated = false;
      for (;;) {
        const { value, done } = await reader.read();
        if (done) break;
        total += value.byteLength;
        if (total > maxBytes) {
          truncated = true;
          chunks.push(value.slice(0, Math.max(0, maxBytes - (total - value.byteLength))));
          await reader.cancel();
          break;
        }
        chunks.push(value);
      }
    
      // Concat + decode. UTF-8 is good enough for this use case — any weird
      // encoding will surface as mojibake for the LLM judge, still safe.
      const buf = new Uint8Array(total > maxBytes ? maxBytes : total);
      let off = 0;
      for (const c of chunks) {
        buf.set(c, off);
        off += c.byteLength;
      }
      const text = new TextDecoder('utf-8', { fatal: false }).decode(buf);
    
      return {
        url: res.url || url,
        requestedUrl: url,
        status: res.status,
        contentType,
        text,
        truncated,
        fetchedAt: new Date().toISOString(),
        bytesFetched: Math.min(total, maxBytes),
        fromCache: false,
      };
    }
    
    export async function resolveSource(
      identifier: string,
      opts: ResolveOptions,
    ): Promise<ResolvedSource> {
      if (!opts.allowFetch) {
        throw new CitationResolveError(
          'Citation fetch is disabled. Pass allowFetch:true or set IRIS_CITATION_ALLOW_FETCH=1.',
          'fetch_disabled',
        );
      }
    
      const url = normalizeDoiToUrl(identifier);
      const cacheKey = url;
      const cached = cacheGet(cacheKey);
      if (cached) return cached;
    
      const fetched = await doFetch(url, opts, opts.maxRedirects ?? 3);
      cacheSet(cacheKey, fetched);
      return fetched;
    }
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are minimal (readOnlyHint=false, etc.), but the description thoroughly discloses the three-phase pipeline, SSRF guarding, opt-in fetch, cost capping, rate limiting, error modes, and that per-citation errors are collected in response rather than thrown. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but well-structured with sections and front-loaded summary. Each paragraph adds specific value (behavior, output shape, usage, parameters, errors). Minor redundancy in parameter defaults but overall efficient for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 10 parameters, no output schema, and complex behavior, the description covers all aspects: input/output structure, error modes, safety considerations, and context for use. Extremely complete for a high-complexity tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline 3. The description adds significant value by explaining defaults (e.g., allow_fetch=false, max_citations=20), auto-detection of provider, merging of domain_allowlist with env, and behavior of max_citations (extras silently skipped). This goes beyond what the schema's textual descriptions provide.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it extracts citations, fetches sources, and uses an LLM judge to verify grounding. It distinguishes itself from siblings like evaluate_with_llm_judge (general semantic scoring) and evaluate_output (hallucination heuristic) by specifying it is the 'grounding-check path — narrowest in scope, deepest in rigor.'

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use: 'when the output makes factual claims backed by [1]-style references, DOIs, or URLs' and for research/legal/medical agents. Also states when not to use: when output has no citations, without allow_fetch=true, or with untrusted output. Provides alternative tool suggestions (evaluate_output's heuristic).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/iris-eval/mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server