Debate Agent MCP

README.md•17.8 KiB

# Debate Agent MCP > **EXPERIMENTAL**: This project is in active development. APIs and features may change without notice. Use at your own risk in production environments. A multi-agent debate framework for **code review** and **debate planning** with P0/P1/P2 severity scoring. ## Architecture Overview ``` ┌─────────────────────────────────────────────────────────────────────────────┐ │ DEBATE AGENT MCP │ │ │ │ ┌───────────────────────────────────────────────────────────────────────┐ │ │ │ MCP SERVER LAYER │ │ │ │ (Model Context Protocol) │ │ │ │ │ │ │ │ Exposes tools via stdio to Claude Code / AI assistants: │ │ │ │ • list_agents • read_diff • run_agent │ │ │ │ • debate_review • debate_plan │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────────────────────────────────────────┐ │ │ │ ORCHESTRATOR LAYER │ │ │ │ (@debate-agent/core) │ │ │ │ │ │ │ │ Pipeline: │ │ │ │ 1. Read git diff ──► 2. Run agents in parallel (Promise.all) │ │ │ │ 3. Critique round ──► 4. Deterministic scoring ──► 5. Merge │ │ │ │ │ │ │ └───────────────────────────────────────────────────────────────────────┘ │ │ │ │ │ ┌───────────────┴───────────────┐ │ │ ▼ ▼ │ │ ┌──────────────────────────┐ ┌──────────────────────────┐ │ │ │ Claude CLI │ │ Codex CLI │ │ │ │ /opt/homebrew/bin/claude│ │ /opt/homebrew/bin/codex │ │ │ │ │ │ │ │ │ │ spawn() as subprocess │ │ spawn() as subprocess │ │ │ │ Uses YOUR credentials │ │ Uses YOUR credentials │ │ │ └──────────────────────────┘ └──────────────────────────┘ │ │ │ │ │ │ ▼ ▼ │ │ Anthropic API OpenAI API │ │ (auth via local CLI) (auth via local CLI) │ │ │ └─────────────────────────────────────────────────────────────────────────────┘ ``` ## How It Works ### No Authentication Required The MCP itself requires **no API keys or authentication**. It orchestrates your locally installed CLI tools: ``` ┌─────────────────────────────────────────────────────────────────┐ │ YOUR MACHINE │ │ │ │ ~/.claude/credentials ──► claude CLI ──► Anthropic API │ │ ~/.codex/credentials ──► codex CLI ──► OpenAI API │ │ │ │ The MCP just runs: spawn("claude", ["--print", prompt]) │ │ Same as typing in your terminal! │ │ │ └─────────────────────────────────────────────────────────────────┘ ``` ### Execution Flow ``` Step 1: Build Prompt ├── Combine review question + git diff + platform rules ├── Add P0/P1/P2 severity definitions └── Request JSON output format Step 2: Parallel Execution ├── spawn("/opt/homebrew/bin/claude", ["--print", prompt]) ├── spawn("/opt/homebrew/bin/codex", ["exec", prompt]) └── Both run simultaneously via Promise.all() Step 3: Capture Output ├── Read stdout from each CLI process └── Parse JSON responses Step 4: Deterministic Scoring (No AI) ├── Count P0/P1/P2 findings ├── Check file accuracy against diff ├── Penalize false positives └── Score clarity and fix quality Step 5: Merge & Report ├── Pick winner by highest score ├── Combine unique findings from all agents └── Generate final recommendation ``` ## 360 Debate Feature (v2.0) The 360 Debate feature provides **multi-turn cross-review** with confidence scoring. It supports two modes: | Mode | Description | Output | |------|-------------|--------| | **review** | P0/P1/P2 code review findings | `.debate/review-TIMESTAMP.md` | | **plan** | Implementation planning with consensus | `.debate/plan-TIMESTAMP.md` | ### 360 Debate Pipeline ``` Round 1: Initial Review (Parallel) ┌─────────┐ ┌─────────┐ │ Claude │ │ Codex │ │ Review │ │ Review │ └────┬────┘ └────┬────┘ │ │ ▼ ▼ Round 2: 360 Cross-Review (Each agent reviews ALL others) ┌─────────────────────────────────────────┐ │ Claude reviews Codex's findings │ │ Codex reviews Claude's findings │ │ "Is P0 about null pointer valid?" │ │ "Did Codex miss the SQL injection?" │ └─────────────────────────────────────────┘ │ ▼ Confidence Check: >= 80%? │ ├── No ──► Repeat (max 3 rounds) │ ▼ Yes Winner Composition ┌─────────────────────────────────────────┐ │ Highest scoring agent composes final │ │ Merge valid findings, eliminate dupes │ │ Document elimination reasons │ └─────────────────────────────────────────┘ │ ▼ Validation Phase ┌─────────────────────────────────────────┐ │ Other agents vote: approve/reject │ │ Majority approval required │ │ Winner breaks ties │ └─────────────────────────────────────────┘ │ ▼ Final Report: .debate/review-*.md or .debate/plan-*.md ``` ### Review Mode Example ```typescript // Run 360 code review const result = await runDebate360({ question: 'Review this code for security issues', mode: 'review', // P0/P1/P2 severity scoring agents: ['claude', 'codex'], platform: 'backend', maxRounds: 3, confidenceThreshold: 80, }); // Output: .debate/review-2025-01-20T10-30-00.md console.log(result.finalFindings); // Validated P0/P1/P2 findings ``` ### Plan Mode Example ```typescript // Run 360 implementation planning const result = await runDebate360({ question: 'Plan how to add user authentication', mode: 'plan', // Consensus-based, no severity agents: ['claude', 'codex'], maxRounds: 3, confidenceThreshold: 80, }); // Output: .debate/plan-2025-01-20T10-30-00.md console.log(result.finalPlan); // Validated implementation steps ``` ### Mode Comparison | Aspect | Review Mode | Plan Mode | |--------|-------------|-----------| | **Purpose** | Find bugs, security issues | Plan implementation approach | | **Scoring** | P0/P1/P2 severity (max 134 pts) | Clarity + Consensus (0-100) | | **Output** | Findings with fix suggestions | Implementation steps with phases | | **Winner** | Highest severity score | Highest consensus + clarity | | **Final Result** | Merged P0/P1/P2 findings | Merged implementation plan | | **MD File** | `review-TIMESTAMP.md` | `plan-TIMESTAMP.md` | **Benefits of 360 Debate**: - Eliminate hallucinated findings (validated by multiple agents) - Catch missed issues (one agent finds what another missed) - Build confidence scores (80% threshold ensures agreement) - Reduce false positives (adversarial review catches incorrect assessments) - Comprehensive report in `.debate/` directory --- ## Packages | Package | Description | Install | |---------|-------------|---------| | [`@debate-agent/core`](./packages/core) | Core logic (framework-agnostic) | `npm i @debate-agent/core` | | [`@debate-agent/mcp-server`](./packages/mcp-server) | MCP server for CLI users | `npm i -g @debate-agent/mcp-server` | | [`debate-agent-mcp`](./packages/vscode-extension) | VS Code extension | Install from marketplace | ## Quick Start ### Prerequisites You must have the agent CLIs installed and authenticated: ```bash # Check Claude CLI claude --version claude auth status # Should show logged in # Check Codex CLI codex --version # Should be authenticated via OpenAI # The MCP will spawn these - no additional auth needed ``` ### For CLI Users ```bash # Install globally npm install -g @debate-agent/mcp-server # Start MCP server debate-agent # Or run directly npx @debate-agent/mcp-server ``` ### For Claude Code ```bash # Add MCP to Claude Code claude mcp add debate-reviewer -- node /path/to/packages/mcp-server/dist/index.js # Verify connection claude mcp list # Should show: debate-reviewer: ✓ Connected ``` ### For SDK Users ```bash npm install @debate-agent/core ``` ```typescript import { runDebate, createDebatePlan } from '@debate-agent/core'; // Run code review debate const result = await runDebate({ question: 'Review this code for security issues', agents: ['codex', 'claude'], platform: 'backend', }); // Create debate plan const plan = createDebatePlan('Best caching strategy', ['codex', 'claude'], 'collaborative', 2); ``` ## MCP Tools | Tool | Description | |------|-------------| | `list_agents` | List all configured agents | | `read_diff` | Read uncommitted git diff | | `run_agent` | Run a single agent with prompt | | `debate_review` | Multi-agent P0/P1/P2 code review (single round) | | `debate_plan` | Create structured debate plan | | `debate_360` | **360 multi-round debate with modes: `review` (P0/P1/P2) or `plan` (consensus)** | ## Configuration Create `debate-agent.config.json` in your project root: ```json { "agents": { "codex": { "name": "codex", "path": "/opt/homebrew/bin/codex", "args": ["exec", "--skip-git-repo-check"], "timeout_seconds": 180 }, "claude": { "name": "claude", "path": "/opt/homebrew/bin/claude", "args": ["--print", "--dangerously-skip-permissions"], "timeout_seconds": 180 }, "gemini": { "name": "gemini", "path": "/opt/homebrew/bin/gemini", "args": ["--prompt"], "timeout_seconds": 180 } }, "debate": { "default_agents": ["codex", "claude"], "include_critique_round": true, "default_mode": "adversarial" } } ``` ## Severity Levels | Level | Criteria | |-------|----------| | **P0** | Breaking defects, crashes, data loss, security/privacy problems, build blockers | | **P1** | Likely bugs/regressions, incorrect logic, missing error-handling, missing tests | | **P2** | Minor correctness issues, small logic gaps, non-blocking test gaps | Defined in: `packages/core/src/prompts/review-template.ts` ## Platform-Specific Rules | Platform | Focus Areas | |----------|-------------| | **flutter** | Async misuse, setState, dispose(), BuildContext in async, Riverpod leaks | | **android** | Manifest, permissions, ProGuard, lifecycle violations, context leaks | | **ios** | plist, ATS, keychain, signing, main thread UI, retain cycles | | **backend** | DTO mismatch, HTTP codes, SQL injection, auth flaws, rate limiting | | **general** | Null pointers, resource leaks, race conditions, XSS, input validation | Defined in: `packages/core/src/prompts/platform-rules.ts` ## Scoring System The scoring is **deterministic** (no AI) - pure rule-based evaluation: | Criteria | Points | Max | |----------|--------|-----| | P0 Finding | +15 | 45 | | P1 Finding | +8 | 32 | | P2 Finding | +3 | 12 | | False Positive | -10 | -30 | | Concrete Fix | +5 | 25 | | File Accuracy | +2 | 10 | | Clarity | 0-10 | 10 | **Maximum possible score**: 134 **Minimum possible score**: -30 Defined in: `packages/core/src/engine/judge.ts` ## Debate Modes | Mode | Description | |------|-------------| | **adversarial** | Agents challenge each other's positions | | **consensus** | Agents work to find common ground | | **collaborative** | Agents build on each other's ideas | ## Project Structure ``` debate-agent-mcp/ ├── packages/ │ ├── core/ # @debate-agent/core │ │ ├── src/ │ │ │ ├── engine/ │ │ │ │ ├── debate.ts # Orchestration (parallel execution) │ │ │ │ ├── judge.ts # Deterministic scoring rules │ │ │ │ ├── merger.ts # Combine findings from agents │ │ │ │ └── planner.ts # Debate plan generation │ │ │ ├── prompts/ │ │ │ │ ├── review-template.ts # P0/P1/P2 definitions │ │ │ │ └── platform-rules.ts # Platform-specific scrutiny │ │ │ ├── tools/ │ │ │ │ ├── read-diff.ts # Git diff reader │ │ │ │ └── run-agent.ts # CLI spawner (spawn()) │ │ │ ├── config.ts # Config loader │ │ │ ├── types.ts # TypeScript types │ │ │ └── index.ts # Public exports │ │ └── package.json │ │ │ ├── mcp-server/ # @debate-agent/mcp-server │ │ ├── src/ │ │ │ ├── index.ts # MCP server (stdio transport) │ │ │ └── bin/cli.ts # CLI entry point │ │ └── package.json │ │ │ └── vscode-extension/ # debate-agent-mcp (VS Code) │ ├── src/ │ │ └── extension.ts │ └── package.json │ ├── debate-agent.config.json # Example config ├── package.json # Monorepo root ├── pnpm-workspace.yaml └── README.md ``` ## Integration ### Claude Desktop ```json { "mcpServers": { "debate-agent": { "command": "node", "args": ["/path/to/packages/mcp-server/dist/index.js"] } } } ``` ### Claude CLI ```bash claude mcp add debate-agent -- node /path/to/packages/mcp-server/dist/index.js ``` ### VS Code / Cursor Install the VS Code extension - it auto-configures MCP. ## Development ```bash # Clone repo git clone https://github.com/ferdiangunawan/debate-agent-mcp cd debate-agent-mcp # Install dependencies npm install # Build all packages npm run build # Build specific package npm run build:core npm run build:server npm run build:extension ``` ## Known Limitations - **Experimental**: APIs may change without notice - **Local CLIs required**: You must have `claude` and `codex` CLIs installed and authenticated - **Timeout risks**: Long diffs may cause agent timeouts (default 180s) - **No streaming**: Currently waits for full response before processing - **Minimum 2 agents**: 360 debate requires at least 2 agents for cross-review ## Contributing Contributions welcome! Please open an issue first to discuss proposed changes. ## License MIT

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ferdiangunawan/debate-agent-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•17.8 KiB