Provides access to Google's Gemini models (Gemini 2.5/3 Pro, Flash) as part of a multi-model conclave for generating opinions, peer rankings, and synthesized answers across frontier AI models.
Provides access to OpenAI's GPT and reasoning models (GPT-4.1/5.1, o3-mini, o4-mini) as part of a multi-model conclave for generating opinions, peer rankings, and synthesized answers across frontier AI models.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Conclave MCPShould we use React or Vue for our new dashboard project?"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Conclave MCP
An MCP (Model Context Protocol) server that provides access to a "conclave" of LLM models, enabling any MCP-compatible client to consult multiple frontier models for diverse opinions, peer-ranked evaluations, and synthesized answers.
Why This Exists
When working with an AI assistant, you're getting one model's perspective. Sometimes that's exactly what you need. But for important decisions—technical architecture, business strategy, creative direction, complex analysis, or any situation where blind spots matter—a plurality of opinions surfaces alternatives you might miss.
Conclave brings democratic AI consensus to any workflow.
Instead of manually querying multiple AI services, you can consult the conclave through Claude Desktop, Claude Code, or any MCP client. Get ranked opinions from multiple frontier models (GPT, Claude, Gemini, Grok, DeepSeek) and receive a synthesized answer representing collective AI wisdom.
Use cases include:
Technical: Architecture decisions, code review, debugging, API design
Business: Strategy analysis, proposal review, market research synthesis
Creative: Writing feedback, brainstorming, editorial perspectives
Research: Literature review, fact-checking, multi-perspective analysis
Decision-making: Pros/cons analysis, risk assessment, option evaluation
Inspired by Andrej Karpathy's llm-council concept. This project reimplements the core ideas as an MCP server for seamless integration with AI-assisted workflows.
How It Works
The conclave operates in up to 3 stages:
Features
Tiered queries: Choose cost/depth tradeoff (quick | ranked | full)
Three council tiers: Premium (frontier), Standard (balanced), Budget (fast/cheap)
Consensus protocol: Detects agreement level, triggers tiebreaker on splits
Odd conclave size: Ensures tiebreaker votes can break deadlocks
Rotating chairmanship: Weekly rotation prevents single-model bias
Chairman presets: Context-aware chairman selection (code, creative, reasoning)
Cost estimation: Know what you'll spend before querying
Eval-light: Standalone benchmark runner for tracking performance over time
Installation
Prerequisites
Get an OpenRouter API key from https://openrouter.ai/keys
Add credits to your OpenRouter account (pay-as-you-go)
Setup
Configure Claude Desktop
Option 1: Desktop Extensions (Recommended)
Open Claude Desktop
Go to Settings > Extensions > Advanced settings > Install Extension...
Navigate to the
conclave-mcpdirectoryFollow prompts to configure your
OPENROUTER_API_KEYRestart Claude Desktop
Option 2: Manual Config
Open Claude Desktop, go to Settings > Developer > Edit Config, and add the following to claude_desktop_config.json:
Replace /path/to/conclave-mcp with your actual path, save, and restart Claude Desktop.
Configure Claude Code
Add the server using the CLI:
Or copy .mcp.json.example to .mcp.json and update paths:
Verify with /mcp in Claude Code or claude mcp list in terminal.
Available Tools
conclave_quick
Fast parallel opinions (Stage 1 only). Queries all conclave models and returns individual responses.
Cost: ~$0.01-0.03 per query
Use for: Quick brainstorming, getting diverse perspectives fast
conclave_ranked
Opinions with peer rankings (Stage 1 + 2). Shows which model performed best on this specific question.
Cost: ~$0.05-0.10 per query
Use for: Code review, comparing approaches, seeing which model "won"
conclave_full
Complete conclave with synthesis (all 3 stages). Includes consensus detection and chairman tiebreaker.
Cost: ~$0.10-0.20 per query
Options:
tier: Model tier -"premium","standard"(default),"budget"chairman: Override chairman model (e.g.,"anthropic/claude-sonnet-4")chairman_preset: Use a preset ("code","creative","reasoning","concise","balanced")
Use for: Important decisions, architecture choices, complex debugging
conclave_config
View current configuration: conclave members, chairman rotation status, consensus thresholds.
conclave_estimate
Estimate costs before running a query.
conclave_models
List all available models with selection numbers. Shows models grouped by tier with stable numbering:
Premium tier: 1-10
Standard tier: 11-20
Budget tier: 21-30
Chairman pool: 31-40
conclave_select
Create a custom conclave from model numbers. The first model becomes the chairman.
Creates:
Chairman: #31 (deepseek-r1)
Members: #1 (claude-opus-4.5), #11 (claude-sonnet-4.5), #21 (gemini-2.5-flash)
Custom selection persists until server restart or conclave_reset.
conclave_reset
Clear custom conclave selection and return to tier-based configuration.
Custom Model Selection
For full control over which models participate in the conclave:
List available models: Use
conclave_modelsto see all models with their numbersSelect your lineup: Use
conclave_select(models="31,1,11,21")- first number is chairmanQuery: Use
conclave_quick,conclave_ranked, orconclave_fullas normalReset: Use
conclave_resetto return to tier-based config
Example workflow:
Configuration
Edit config.py to customize:
Conclave Tiers
Each tier has unique models (no overlap) for proper price/performance differentiation:
Chairman Rotation
The chairman pool uses reasoning models only (not chat models) for high-quality synthesis:
Consensus Thresholds
Eval-Light
A standalone benchmark runner for testing and comparing conclave performance across tiers and over time.
Test Suite Overview
The eval suite includes 16 tasks across 9 categories, designed to test different model capabilities:
Category | Tasks | Difficulty | What It Tests |
math | 2 | Easy-Medium | Arithmetic, word problems, step-by-step reasoning |
code | 2 | Easy-Medium | Bug detection, concept explanation, code examples |
reasoning | 2 | Medium-Hard | Syllogisms, multi-step logic puzzles |
analysis | 2 | Medium | Logical fallacies, tradeoff analysis |
summarization | 2 | Medium | Technical docs, business reports |
writing_business | 2 | Easy-Medium | Professional emails, proposals |
writing_creative | 2 | Easy-Medium | Story openings, original metaphors |
creative | 1 | Easy | Analogies with explanations |
factual | 1 | Easy | Science explanations for general audience |
Running Evaluations
Output Format
Results are saved to evals/eval_<tier>_<mode>_<timestamp>.json with:
metadata: Timestamp, tier, mode, chairman model
summary: Success rate, total time, average time per task
results: Per-task details including:
Individual model responses
Peer rankings (for ranked/full modes)
Chairman synthesis (for full mode)
Consensus level
Example Output
Comparing Tiers
Run the same eval across all tiers to compare model quality vs cost:
Then compare the JSON outputs to see how different model tiers perform on the same tasks.
Use Cases
Scenario | Recommended Tool | Why |
"Review this function" |
| See which model catches the most issues |
"Redis vs PostgreSQL for sessions?" |
| Important decision, need synthesis |
"Ideas for this feature" |
| Fast diverse brainstorming |
"Debug this error" |
| Quick parallel diagnosis |
"Rewrite this paragraph" |
| Creative synthesis |
"Is this architecture sound?" |
| Technical synthesis |
Example Tool Output
Project Structure
Adding Models
OpenRouter supports 200+ models. Find model IDs at https://openrouter.ai/models
Important: Keep each tier's models unique (no overlap) for proper differentiation.
How OpenRouter Works
OpenRouter is a unified API gateway—you don't need separate accounts with OpenAI, Google, Anthropic, etc. One API key, one credit balance, access to all models.
Sign up: https://openrouter.ai
Add credits (prepaid, or enable auto-top-up)
Use your single API key for all models
License
MIT
Attribution
Inspired by Andrej Karpathy's llm-council. The original is a web application for interactively exploring LLM comparisons. This project reimplements the council concept as an MCP server for integration with AI-assisted editors, adding consensus protocol and tiebreaker mechanics.