agent-eval-mcp
Provides CI-friendly quality gates for agent evaluation, enabling integration with GitHub Actions workflows for automated testing and deployment.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@agent-eval-mcprun evaluation suite on my candidate agent outputs"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Enterprise AI Agent Evaluation & Deployment Platform
A dependency-light evaluation platform for RAG/wiki-quality AI agents. It scores agent outputs for faithfulness, retrieval relevance, hallucination risk, latency, and cost; produces CI-friendly quality gates; emits regression/canary reports; and exposes the workflow through a lightweight MCP-style stdio tool server.
What Is Included
JSONL evaluation case format for RAG/wiki workflows.
Deterministic checks for:
faithfulness to retrieved context and reference answer,
retrieval relevance against question and expected keywords,
hallucination risk from unsupported answer content,
latency and cost thresholds.
100+ synthetic case generator.
CI/CD-style suite-level and case-level quality gates.
Markdown and JSON evaluation reports.
Regression report comparing baseline and candidate runs.
Canary promotion policy with traffic ramp decisions.
OpenTelemetry-compatible JSONL traces/metrics.
MCP-style stdio server exposing evaluation tools.
Related MCP server: iris-eval/mcp-server
Quick Start
git clone https://github.com/ad-github1/ENTERPRISE-AI-AGENT-EVALUATION-PLATFORM.git
cd ENTERPRISE-AI-AGENT-EVALUATION-PLATFORM
PYTHONPATH=src python3 -m agent_eval_platform generate-cases --count 120 --out examples/wiki_eval_cases.jsonl
PYTHONPATH=src python3 -m agent_eval_platform evaluate \
--cases examples/wiki_eval_cases.jsonl \
--gate examples/quality_gate.json \
--variant candidate \
--json-out reports/eval_result.json \
--markdown-out reports/eval_report.md \
--traces-out reports/traces.jsonl
PYTHONPATH=src python3 -m unittest discover -s testsAfter installation:
pip install -e .
agent-eval evaluate --cases examples/wiki_eval_cases.jsonl --gate examples/quality_gate.json
agent-eval-mcpCase Format
Each JSONL row contains one evaluated agent run:
{
"case_id": "case-0001",
"question": "What contribution is Ada Lovelace known for in mathematics?",
"reference_answer": "Ada Lovelace is known for Analytical Engine notes.",
"expected_keywords": ["Ada Lovelace", "Analytical Engine"],
"retrieved_docs": [
{"doc_id": "wiki-1", "title": "Ada Lovelace", "text": "...", "score": 0.94}
],
"agent_answer": "Ada Lovelace is known for Analytical Engine notes.",
"latency_ms": 240.5,
"cost_usd": 0.0031,
"tags": ["wiki", "rag"]
}CI Quality Gate
The evaluator exits non-zero when --fail-on-gate is used and thresholds fail:
PYTHONPATH=src python3 -m agent_eval_platform evaluate \
--cases examples/wiki_eval_cases.jsonl \
--gate examples/quality_gate.json \
--fail-on-gateSee .github/workflows/agent-eval.yml for a GitHub Actions example.
MCP-Style Tool Server
Run:
PYTHONPATH=src python3 -m agent_eval_platform.mcp_serverSupported JSON-RPC methods:
initializetools/listtools/callwith:run_evaluation_suitecompare_regressiondecide_canary
This is intentionally stdio and dependency-free. It follows the MCP tool shape closely enough for local agent integration demos without requiring the MCP Python SDK.
Canary Workflow
PYTHONPATH=src python3 -m agent_eval_platform canary \
--result reports/eval_result.json \
--config examples/canary_config.json \
--json-out reports/canary_decision.jsonThe decision is hold, increase_traffic, or promote based on suite quality and minimum case coverage.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ad-github1/ENTERPRISE-AI-AGENT-EVALUATION-PLATFORM'
If you have feedback or need assistance with the MCP directory API, please join our Discord server