ARSR MCP Server

Adaptive Retrieval-Augmented Self-Refinement — a closed-loop MCP server that lets LLMs iteratively verify and correct their own claims using uncertainty-guided retrieval.

What it does

Unlike one-shot RAG (retrieve → generate), ARSR runs a refinement loop:

Generate draft → Decompose claims → Score uncertainty
       ↑                                    ↓
   Decide stop ← Revise with evidence ← Retrieve for low-confidence claims

The key insight: retrieval is guided by uncertainty. Only claims the model is unsure about trigger evidence fetching, and the queries are adversarial — designed to disprove the claim, not just confirm it.

Related MCP server: Verity

Architecture

The server exposes 6 MCP tools. The outer LLM (Claude, GPT, etc.) orchestrates the loop by calling them in sequence:

#	Tool	Purpose
1	`arsr_draft_response`	Generate initial candidate answer (returns `is_refusal` flag)
2	`arsr_decompose_claims`	Split into atomic verifiable claims
3	`arsr_score_uncertainty`	Estimate confidence via semantic entropy
4	`arsr_retrieve_evidence`	Web search for low-confidence claims
5	`arsr_revise_response`	Rewrite draft with evidence
6	`arsr_should_continue`	Decide: iterate or finalize

Inner LLM: Tools 1-5 use Claude Haiku internally for intelligence (query generation, claim extraction, evidence evaluation). This keeps costs low while the outer model handles orchestration.

Refusal detection: arsr_draft_response returns a structured is_refusal flag (classified by the inner LLM) indicating whether the draft is a non-answer. When is_refusal is true, downstream tools (decompose, revise) pivot to extracting claims from the original query and building an answer from retrieved evidence instead of trying to refine a refusal.

Web Search: arsr_retrieve_evidence uses the Anthropic API's built-in web search tool — no external search API keys needed.

Setup

Prerequisites

Node.js 18+
An Anthropic API key

Install & Build

cd arsr-mcp-server
npm install
npm run build

Environment

export ANTHROPIC_API_KEY="sk-ant-..."

Run

stdio mode (for Claude Desktop, Cursor, etc.):

npm start

HTTP mode (for remote access):

TRANSPORT=http PORT=3001 npm start

Claude Desktop Configuration

Add to your claude_desktop_config.json:

Npm:

{
  "mcpServers": {
    "arsr": {
      "command": "npx",
      "args": ["@jayarrowz/mcp-arsr"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "ARSR_MAX_ITERATIONS": "3",
        "ARSR_ENTROPY_SAMPLES": "3",
        "ARSR_RETRIEVAL_STRATEGY": "adversarial",
        "ARSR_INNER_MODEL": "claude-haiku-4-5-20251001"
      }
    }
  }
}

Local build:

{
  "mcpServers": {
    "arsr": {
      "command": "node",
      "args": ["/path/to/arsr-mcp-server/dist/src/index.js"],
      "env": {
        "ANTHROPIC_API_KEY": "sk-ant-...",
        "ARSR_MAX_ITERATIONS": "3",
        "ARSR_ENTROPY_SAMPLES": "3",
        "ARSR_RETRIEVAL_STRATEGY": "adversarial",
        "ARSR_INNER_MODEL": "claude-haiku-4-5-20251001"
      }
    }
  }
}

How the outer LLM uses it

The orchestrating LLM calls the tools in sequence:

1. draft = arsr_draft_response({ query: "When was Tesla founded?" })
   // draft.is_refusal indicates if the inner LLM refused to answer
2. claims = arsr_decompose_claims({ draft: draft.draft, original_query: "When was Tesla founded?", is_refusal: draft.is_refusal })
3. scored = arsr_score_uncertainty({ claims: claims.claims })
4. low = scored.scored.filter(c => c.confidence < 0.85)
5. evidence = arsr_retrieve_evidence({ claims_to_check: low })
6. revised = arsr_revise_response({ draft: draft.draft, evidence: evidence.evidence, scored: scored.scored, original_query: "When was Tesla founded?", is_refusal: draft.is_refusal })
7. decision = arsr_should_continue({ iteration: 1, scored: revised_scores })
   → if "continue": go to step 2 with revised text
   → if "stop": return revised.revised to user

Configuration

All settings can be overridden via environment variables, falling back to defaults if unset:

Setting	Env var	Default	Description
`max_iterations`	`ARSR_MAX_ITERATIONS`	`3`	Budget limit for refinement loops
`confidence_threshold`	`ARSR_CONFIDENCE_THRESHOLD`	`0.85`	Claims above this skip retrieval
`entropy_samples`	`ARSR_ENTROPY_SAMPLES`	`3`	Rephrasings for semantic entropy
`retrieval_strategy`	`ARSR_RETRIEVAL_STRATEGY`	`adversarial`	`adversarial`, `confirmatory`, or `balanced`
`inner_model`	`ARSR_INNER_MODEL`	`claude-haiku-4-5-20251001`	Model for internal intelligence

Cost estimate

Per refinement loop iteration (assuming ~5 claims, 3 low-confidence):

Inner LLM calls: ~6-10 Haiku calls ≈ $0.002-0.005
Web searches: 6-9 queries ≈ included in API
Typical total for 2 iterations: < $0.02

Images

Before:

After:

License

MIT

mcp-arsr