Which integrations are available for this server?

Provides integration with OpenAI models, enabling them to participate in multi-round consensus debates as configured participants or judges in the consensus protocol.

How do I use consensus-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@consensus-mcp debate the best approach to reduce carbon emissions in urban transportation" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

de en es ja ko ru zh

consensus-mcp

by entropyvortex

Overview Schema Related Servers Score Discussions

TypeScript

Remote

ai-consensus-mcp

npm license

Grok-centered multi-model consensus inside Cursor, Claude Code, and Windsurf. One config file. 14 tools. Measurably better decisions on code review, architecture, security, and hard calls.

Turn any set of models into a disciplined roundtable that catches what single-model prompting misses.

30-second install

npx -y ai-consensus-mcp config          # walks you through providers + participants
npx -y ai-consensus-mcp install --config ~/.consensus.config.json

Restart your host. The consensus tool and 13 expert panels appear in autocomplete.

Benchmarks are the point. Scroll one section for the data.

Related MCP server: Claude Code AI Collaboration MCP Server

Proven in Benchmarks – Not Just Marketing

Consensus beats single-model prompting on objective quality on real software engineering tasks.

In the most rigorous evaluation to date (architecture_v2 panel, 4 held-out cases, 3 deterministic runs each, seed=42, external judge model never involved in the debate):

Metric (held-out rubric, blind)	Consensus	Single-Model Baseline	Δ
Average rubric score (0-100)	83.3	48.0	+35.3
Wins (external judge, 12/12 runs)	12	0	100%

The panel won on every single run against the identical model used as a strong single-shot baseline. The gap was largest on the case that single-model prompting handled worst (+51 points).

Self-reported confidence told the opposite story: consensus runs averaged lower confidence (60.0) than the baseline (75.4). The external judge still preferred the consensus output every time.

Why this matters

+35 rubric points on a 5-criterion architecture rubric (quantification, single recommendation, reversibility weighing, tripwire specificity, failure-mode realism).
The baseline repeatedly missed reversibility analysis and concrete tripwires. The panel surfaced them consistently.
This is not "more opinions = better." This is a specific protocol (blind round 1 → full-visibility debate → confidence-weighted scoring + structured judge synthesis) beating a strong frontier model at the same task.

Full benchmark suite, raw JSON outputs, rubric definitions, human eval protocol, and one-click reproduction:

npx ai-consensus-mcp bench -p architecture_v2 --runs 3 --seed 42 \
  --evaluator-model claude-opus-4-5 --evaluator-provider anthropic

The same harness exists for code review, security red-teaming, decision-making, incident postmortems, ML research, and product strategy panels. Early runs show the same structural pattern: the multi-model debate surfaces edge cases and trade-off rigor that single-model answers elide.

Reproduce the exact numbers above with the command in the current repo (requires Grok + Anthropic keys for the evaluator). Raw data lives in the repo under bench-*.json artifacts.

Honest caveats (we ship these in the output):

N=12 is small but the direction was 12/12 with a 35-point gap. Reproducible with the seed.
Real cost: ~40× tokens and 20× wall time vs one baseline call. For high-stakes architecture or security decisions this is cheap insurance. For routine refactors, single-model is usually fine.
Self-reported confidence is a poor quality signal. The panel often surfaces more uncertainty while producing better answers.

The benchmark CLI and held-out rubric evaluator ship with the package. This is not marketing copy — it's an executable claim you can run yourself.

What's new in v0.12

8 versioned v2 expert panels with machine-readable expectedOutputShape, structured rationale, and tighter prompts (architecture, code review, security red-team, ML research 2026, product strategy, decision-making, incident postmortem, research synthesis).
bench CLI subcommand — deterministic uplift measurement against single-model baseline, with optional held-out LLM-as-judge rubric scoring.
Persistent project memory (opt-in) — every consensus result stored under a project key. Three new recall tools (consensus_recall, consensus_project_memory, consensus_what_we_decided) with atomic writes and fragment-based search.
panel argument on the generic consensus tool so hosts that don't enumerate per-panel tools can still target a curated panel.
All v1 presets continue to work unchanged.

What it gives you

One config, 14+ tools. Generic consensus plus 13 task-tuned expert panels. Invoke a panel name; get the right personas, rounds, temperature, and judge prompt without tuning knobs.
Any OpenAI-compatible provider. Grok-4, Claude (via Anthropic compat), OpenAI, Groq, Together, Fireworks, local gateways. Per-participant routing.
The calling agent can sit at the table (experimental). Mark a participant kind: "host-sample" and the MCP host answers via sampling/createMessage. Currently works in Claude Desktop; tracked for Claude Code / Cursor / Windsurf.
Live progress. Every engine round, confidence shift, and disagreement surfaces as MCP progress notifications.
Optional durable memory. Project-scoped, queryable history of prior decisions with the exact context that produced them.
Built-in benchmarking. npx ai-consensus-mcp bench measures whether the panel actually helps on your task class — with the same held-out rubric method shown above.

See docs/expert-panels.md for the full catalogue and per-panel output shapes.

Install & Configure

Full instructions: docs/install.md

The interactive config wizard handles providers, participants (including host-sample), judge, and defaults, then writes an atomic, schema-validated ~/.consensus.config.json.

Manual example (minimal):

{
  "providers": {
    "xai": { "baseUrl": "https://api.x.ai/v1", "apiKeyEnv": "GROK_API_KEY" },
    "anthropic": { "baseUrl": "https://api.anthropic.com/v1", "apiKeyEnv": "CONSENSUS_ANTHROPIC_API_KEY" }
  },
  "participants": [
    { "id": "grok", "provider": "xai", "modelId": "grok-4", "personaId": "pessimist" },
    { "id": "domain", "provider": "anthropic", "modelId": "claude-sonnet-4-6", "personaId": "domain-expert" }
  ],
  "judge": { "provider": "xai", "modelId": "grok-4" }
}

Host-sample participants (the calling agent joins the debate)

{
  "kind": "host-sample",
  "id": "self",
  "personaId": "domain-expert",
  "modelHint": "claude-sonnet"
}

When this participant's turn arrives, the MCP host is asked to answer in character. Human approval is required in Claude Desktop today.

The `consensus` tool (and every preset)

Input (generic tool — presets own the panel):

{
  "prompt": "Should we adopt event sourcing for the new billing ledger?",
  "panel": "architecture_v2",           // optional but recommended for real work
  "maxRounds": 4,
  "judge": true,
  "randomSeed": 42                      // for deterministic replay
}

Output on every successful call:

Human-readable markdown summary (final score, per-round table, participant responses, judge synthesis).
structuredContent: the full typed ConsensusResult for programmatic use.

Every engine event (roundComplete, disagreementDetected, synthesisComplete, etc.) is forwarded as an MCP progress notification.

Presets (e.g. consensus_architecture_v2, consensus_security_redteam) are registered as first-class tools. They accept the same knobs except participantIds (the panel owns the voices).

Persistent Memory (opt-in)

Set "memory": { "enabled": true } in your config.

Three new tools become available:

consensus_recall — keyword search with matched fragments
consensus_project_memory — full project history
consensus_what_we_decided — distilled prior conclusions on a topic

Atomic writes, sentinel-locked index, retention policy. Data lives on local disk only. See docs/memory-layer.md for threat model and format.

Protocol (what actually happens)

See the ai-consensus-core protocol diagram for the round structure, scoring formula, and CONFIDENCE: N contract.

This server is a thin, faithful wrapper: it loads your config, builds the right ModelCaller (HTTP or host sampling), wires progress, applies preset panels when requested, and surfaces results + optional memory.

Limits & Non-Goals

Stdio transport only (the MCP server itself). For HTTP/SSE use the core library directly.
No token-budget enforcement inside the tool — put alerts on your provider keys.
Memory is plaintext on disk. Do not enable it for prompts containing secrets you do not want persisted locally.
Host sampling is currently reliable only in Claude Desktop.

If you need something this server deliberately does not do, the right place is almost always ai-consensus-core or a thin custom wrapper around it.

Development

git clone https://github.com/entropyvortex/ai-consensus-mcp.git
cd ai-consensus-mcp
npm install
npm run build
npm test
npm start -- --config ./consensus.config.json

The test suite covers config loading, preset resolution, input schema generation, memory store invariants, and MCP handshake behavior.

Philosophy

Most "multi-agent" frameworks are toys or vendor lock-in.

This one is the opposite: a minimal, observable, deterministic debate protocol (core) + the thinnest possible product surface that makes it usable inside real coding agents (this package).

We optimize for ground-truth quality on hard engineering questions, not for marketing slogans or lowest token count. The benchmark harness ships with the product because claims without executable reproduction are worthless.

License

MIT

Part of the entropyvortex stack — practical, no-bullshit AI open source.

Made with ❤️ in Brazil.

See also: ai-consensus-core — the protocol engine and TypeScript library.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

5dRelease cycle

4Releases (12mo)

Commit activity

Resources

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Related MCP Servers

LLM Responses MCP Server
Agent Orchestration Autonomous Agents Communication
kstrikis
A
license
-
quality
D
maintenance
A Model Context Protocol server that enables collaborative debates between multiple AI agents, allowing them to discuss and reach consensus on user prompts.
Last updated 2025-03-22
1
MIT
Claude Code AI Collaboration
RAG Systems Autonomous Agents Agent Orchestration
atsuki-sakai
A
license
B
quality
D
maintenance
An MCP server that enables multi-provider AI collaboration using models like DeepSeek, OpenAI, and Anthropic through strategies such as parallel execution and consensus building. It provides specialized tools for side-by-side content comparison, quality review, and iterative refinement across different AI providers.
Last updated 2025-07-04
4
1
MIT
HydraMCP
RAG Systems Agent Orchestration Autonomous Agents
Pickle-Pixel
A
license
A
quality
C
maintenance
An MCP server that enables users to query, compare, and synthesize responses from multiple local and cloud LLMs simultaneously using existing subscriptions. It provides tools for parallel model evaluation, consensus polling with an LLM-as-judge, and response synthesis across different model providers.
Last updated 2026-02-08
8
32
15
MIT
guru-pk-mcp
Autonomous Agents AI & Machine Learning
MitsudoAI
A
license
-
quality
C
maintenance
An MCP server that enables multi-round AI expert debates with dynamic expert generation, cross-debate, and Tufte-style infographic export.
Last updated 2025-10-14
6
MIT

View all related MCP servers

Related MCP Connectors

Reasoning Commons
AI Reasoning Cache & Consensus Layer with 11 MCP tools via Streamable HTTP.
mcp
MCP server providing access to the Scorecard API to evaluate and optimize LLM systems.
mcp-aichat
MCP server for AI dialogue using various LLM models via AceDataCloud

View all MCP Connectors

Latest Blog Posts

Who's Calling? MCP Hosts Are an Identity Blind Spot (And the Spec Knows It)
By Om-Shree-0709 on July 25, 2026.
mcp
Agent Identity
OAuth 2.1
Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/entropyvortex/ai-consensus-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server