Skip to main content
Glama

MCP as a Judge ⚖️

mcp-name: io.github.OtherVibes/mcp-as-a-judge

MCP as a Judge acts as a validation layer between AI coding assistants and LLMs, helping ensure safer and higher-quality code.

License: MIT Python 3.13+ MCP Compatible

CI Release PyPI version

MCP as a Judge is a behavioral MCP that strengthens AI coding assistants by requiring explicit LLM evaluations for:

  • Research, system design, and planning

  • Code changes, testing, and task-completion verification

It enforces evidence-based research, reuse over reinvention, and human-in-the-loop decisions.

If your IDE has rules/agents (Copilot, Cursor, Claude Code), keep using them—this Judge adds enforceable approval gates on plan, code diffs, and tests.

Key problems with AI coding assistants and LLMs

  • Treat LLM output as ground truth; skip research and use outdated information

  • Reinvent the wheel instead of reusing libraries and existing code

  • Cut corners: code below engineering standards and weak tests

  • Make unilateral decisions when requirements are ambiguous or plans change

  • Security blind spots: missing input validation, injection risks/attack vectors, least‑privilege violations, and weak defensive programming

Vibe coding doesn’t have to be frustrating

What it enforces

  • Evidence‑based research and reuse (best practices, libraries, existing code)

  • Plan‑first delivery aligned to user requirements

  • Human‑in‑the‑loop decisions for ambiguity and blockers

  • Quality gates on code and tests (security, performance, maintainability)

Key capabilities

  • Intelligent code evaluation via MCP sampling; enforces software‑engineering standards and flags security/performance/maintainability risks

  • Comprehensive plan/design review: validates architecture, research depth, requirements fit, and implementation approach

  • User‑driven decisions via MCP elicitation: clarifies requirements, resolves obstacles, and keeps choices transparent

  • Security validation in system design and code changes

Tools and how they help

Tool

What it solves

set_coding_task

Creates/updates task metadata; classifies task_size; returns next-step workflow guidance

get_current_coding_task

Recovers the latest task_id and metadata to resume work safely

judge_coding_plan

Validates plan/design; requires library selection and internal reuse maps; flags risks

judge_code_change

Reviews unified Git diffs for correctness, reuse, security, and code quality

judge_testing_implementation

Validates tests using real runner output and optional coverage

judge_coding_task_completion

Final gate ensuring plan, code, and tests approvals before completion

raise_missing_requirements

Elicits missing details and decisions to unblock progress

raise_obstacle

Engages the user on trade‑offs, constraints, and enforced changes

🚀 Quick Start

Requirements & Recommendations

MCP Client Prerequisites

MCP as a Judge is heavily dependent on MCP Sampling and MCP Elicitation features for its core functionality:

System Prerequisites

  • Docker Desktop / Python 3.13+ - Required for running the MCP server

Supported AI Assistants

AI Assistant

Platform

MCP Support

Status

Notes

GitHub Copilot

Visual Studio Code

✅ Full

Recommended

Complete MCP integration with sampling and elicitation

Claude Code

-

⚠️ Partial

Requires LLM API key

Sampling Support feature request

Elicitation Support feature request

Cursor

-

⚠️ Partial

Requires LLM API key

MCP support available, but sampling/elicitation limited

Augment

-

⚠️ Partial

Requires LLM API key

MCP support available, but sampling/elicitation limited

Qodo

-

⚠️ Partial

Requires LLM API key

MCP support available, but sampling/elicitation limited

✅ Recommended setup: GitHub Copilot + VS Code — full MCP sampling; no API key needed.

⚠️ Critical: For assistants without full MCP sampling (Cursor, Claude Code, Augment, Qodo), you MUST set LLM_API_KEY. Without it, the server cannot evaluate plans or code. See LLM API Configuration.

💡 Tip: Prefer large context models (≥ 1M tokens) for better analysis and judgments.

If the MCP server isn’t auto‑used

For troubleshooting, visit the FAQs section.

🔧 MCP Configuration

Configure MCP as a Judge in your MCP-enabled client:

One‑click install for VS Code (MCP)

Install for MCP as a Judge

Notes:

  • VS Code controls the sampling model; select it via “MCP: List Servers → mcp-as-a-judge → Configure Model Access”.

  1. Configure MCP Settings:

    Add this to your MCP client configuration file:

    {
      "command": "docker",
      "args": ["run", "--rm", "-i", "--pull=always", "ghcr.io/othervibes/mcp-as-a-judge:latest"],
      "env": {
        "LLM_API_KEY": "your-openai-api-key-here",
        "LLM_MODEL_NAME": "gpt-4o-mini"
      }
    }

    📝 Configuration Options (All Optional):

    • LLM_API_KEY: Optional for GitHub Copilot + VS Code (has built-in MCP sampling)

    • LLM_MODEL_NAME: Optional custom model (see Supported LLM Providers for defaults)

    • The --pull=always flag ensures you always get the latest version automatically

    Then manually update when needed:

    # Pull the latest version
    docker pull ghcr.io/othervibes/mcp-as-a-judge:latest

Method 2: Using uv

  1. Install the package:

    uv tool install mcp-as-a-judge
  2. Configure MCP Settings:

    The MCP server may be automatically detected by your MCP‑enabled client.

    📝 Notes:

    • No additional configuration needed for GitHub Copilot + VS Code (has built-in MCP sampling)

    • LLM_API_KEY is optional and can be set via environment variable if needed

  3. To update to the latest version:

    # Update MCP as a Judge to the latest version
    uv tool upgrade mcp-as-a-judge

Select a sampling model in VS Code

  • Open Command Palette (Cmd/Ctrl+Shift+P) → “MCP: List Servers”

  • Select the configured server “mcp-as-a-judge”

  • Choose “Configure Model Access”

  • Check your preferred model(s) to enable sampling

🔑 LLM API Configuration (Optional)

For AI assistants without full MCP sampling support you can configure an LLM API key as a fallback. This ensures MCP as a Judge works even when the client doesn't support MCP sampling.

  • Set LLM_API_KEY (unified key). Vendor is auto-detected; optionally set LLM_MODEL_NAME to override the default.

Supported LLM Providers

Rank

Provider

API Key Format

Default Model

Notes

1

OpenAI

sk-...

gpt-4.1

Fast and reliable model optimized for speed

2

Anthropic

sk-ant-...

claude-sonnet-4-20250514

High-performance with exceptional reasoning

3

Google

AIza...

gemini-2.5-pro

Most advanced model with built-in thinking

4

Azure OpenAI

[a-f0-9]{32}

gpt-4.1

Same as OpenAI but via Azure

5

AWS Bedrock

AWS credentials

anthropic.claude-sonnet-4-20250514-v1:0

Aligned with Anthropic

6

Vertex AI

Service Account JSON

gemini-2.5-pro

Enterprise Gemini via Google Cloud

7

Groq

gsk_...

deepseek-r1

Best reasoning model with speed advantage

8

OpenRouter

sk-or-...

deepseek/deepseek-r1

Best reasoning model available

9

xAI

xai-...

grok-code-fast-1

Latest coding-focused model (Aug 2025)

10

Mistral

[a-f0-9]{64}

pixtral-large

Most advanced model (124B params)

Client-Specific Setup

Cursor

  1. Open Cursor Settings:

    • Go to FilePreferencesCursor Settings

    • Navigate to the MCP tab

    • Click + Add to add a new MCP server

  2. Add MCP Server Configuration:

    {
      "command": "uv",
      "args": ["tool", "run", "mcp-as-a-judge"],
      "env": {
        "LLM_API_KEY": "your-openai-api-key-here",
        "LLM_MODEL_NAME": "gpt-4.1"
      }
    }

    📝 Configuration Options:

    • LLM_API_KEY: Required for Cursor (limited MCP sampling)

    • LLM_MODEL_NAME: Optional custom model (see Supported LLM Providers for defaults)

Claude Code

  1. Add MCP Server via CLI:

    # Set environment variables first (optional model override)
    export LLM_API_KEY="your_api_key_here"
    export LLM_MODEL_NAME="claude-3-5-haiku"  # Optional: faster/cheaper model
    
    # Add MCP server
    claude mcp add mcp-as-a-judge -- uv tool run mcp-as-a-judge
  2. Alternative: Manual Configuration:

    • Create or edit ~/.config/claude-code/mcp_servers.json

    {
      "command": "uv",
      "args": ["tool", "run", "mcp-as-a-judge"],
      "env": {
        "LLM_API_KEY": "your-anthropic-api-key-here",
        "LLM_MODEL_NAME": "claude-3-5-haiku"
      }
    }

    📝 Configuration Options:

    • LLM_API_KEY: Required for Claude Code (limited MCP sampling)

    • LLM_MODEL_NAME: Optional custom model (see Supported LLM Providers for defaults)

Other MCP Clients

For other MCP-compatible clients, use the standard MCP server configuration:

{
  "command": "uv",
  "args": ["tool", "run", "mcp-as-a-judge"],
  "env": {
    "LLM_API_KEY": "your-openai-api-key-here",
    "LLM_MODEL_NAME": "gpt-5"
  }
}

📝 Configuration Options:

  • LLM_API_KEY: Required for most MCP clients (except GitHub Copilot + VS Code)

  • LLM_MODEL_NAME: Optional custom model (see Supported LLM Providers for defaults)

🔒 Privacy & Flexible AI Integration

🔑 MCP Sampling (Preferred) + LLM API Key Fallback

Primary Mode: MCP Sampling

  • All judgments are performed using MCP Sampling capability

  • No need to configure or pay for external LLM API services

  • Works directly with your MCP-compatible client's existing AI model

  • Currently supported by: GitHub Copilot + VS Code

Fallback Mode: LLM API Key

  • When MCP sampling is not available, the server can use LLM API keys

  • Supports multiple providers via LiteLLM: OpenAI, Anthropic, Google, Azure, Groq, Mistral, xAI

  • Automatic vendor detection from API key patterns

  • Default model selection per vendor when no model is specified

🛡️ Your Privacy Matters

  • The server runs locally on your machine

  • No data collection - your code and conversations stay private

  • No external API calls when using MCP Sampling. If you set LLM_API_KEY for fallback, the server will call your chosen LLM provider only to perform judgments (plan/code/test) with the evaluation content you provide.

  • Complete control over your development workflow and sensitive information

🤝 Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Development Setup

# Clone the repository
git clone https://github.com/OtherVibes/mcp-as-a-judge.git
cd mcp-as-a-judge

# Install dependencies with uv
uv sync --all-extras --dev

# Install pre-commit hooks
uv run pre-commit install

# Run tests
uv run pytest

# Run all checks
uv run pytest && uv run ruff check && uv run ruff format --check && uv run mypy src

© Concepts and Methodology

© 2025 OtherVibes and Zvi Fried. The "MCP as a Judge" concept, the "behavioral MCP" approach, the staged workflow (plan → code → test → completion), tool taxonomy/descriptions, and prompt templates are original work developed in this repository.

Prior Art and Attribution

While “LLM‑as‑a‑judge” is a broadly known idea, this repository defines the original “MCP as a Judge” behavioral MCP pattern by OtherVibes and Zvi Fried. It combines task‑centric workflow enforcement (plan → code → test → completion), explicit LLM‑based validations, and human‑in‑the‑loop elicitation, along with the prompt templates and tool taxonomy provided here. Please attribute as: “OtherVibes – MCP as a Judge (Zvi Fried)”.

❓ FAQ

How is “MCP as a Judge” different from rules/subagents in IDE assistants (GitHub Copilot, Cursor, Claude Code)?

Feature

IDE Rules

Subagents

MCP as a Judge

Static behavior guidance

Custom system prompts

Project context integration

Specialized task handling

Active quality gates

Evidence-based validation

Approve/reject with feedback

Workflow enforcement

Cross-assistant compatibility

How does the Judge workflow relate to the tasklist? Why do we need both?

  • Tasklist = planning/organization: tracks tasks, priorities, and status. It doesn’t guarantee engineering quality or readiness.

  • Judge workflow = quality gates: enforces approvals for plan/design, code diffs, tests, and final completion. It demands real evidence (e.g., unified Git diffs and raw test output) and returns structured approvals and required improvements.

  • Together: Use the tasklist to organize work; use the Judge to decide when each stage is actually ready to proceed. The server also emits next_tool guidance to keep progress moving through the gates.

If the Judge isn’t used automatically, how do I force it?

  • In your prompt: "use mcp-as-a-judge" or "Evaluate plan/code/test using the MCP server mcp-as-a-judge".

  • VS Code: Command Palette → "MCP: List Servers" → ensure "mcp-as-a-judge" is listed and enabled.

  • Ensure the MCP server is running and, in your client, the judge tools are enabled/approved.

How do I select models for sampling in VS Code?

  • Open Command Palette (Cmd/Ctrl+Shift+P) → "MCP: List Servers"

  • Select "mcp-as-a-judge" → "Configure Model Access"

  • Check your preferred model(s) to enable sampling

📄 License

This project is licensed under the MIT License (see LICENSE).

🙏 Acknowledgments


A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
1dRelease cycle
23Releases (12mo)

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/OtherVibes/mcp-as-a-judge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server