multivon-mcp
OfficialServer Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| GOOGLE_API_KEY | No | Google API key for using Gemini models in evaluations | |
| OPENAI_API_KEY | No | OpenAI API key for using OpenAI models in evaluations | |
| ANTHROPIC_API_KEY | No | Anthropic API key for using Claude models in evaluations |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": false
} |
| prompts | {
"listChanged": false
} |
| resources | {
"subscribe": false,
"listChanged": false
} |
| experimental | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| pdfhell_runA | Run the pdfhell adversarial-PDF benchmark against a vision model. Args:
model: Provider:model spec, e.g. Returns:
A dict with overall Provider API keys come from environment variables
( |
| pdfhell_makeA | Generate one adversarial PDF + its answer key. Useful for an agent to inspect what a specific trap looks like before deciding to evaluate against it. Args:
trap: Trap family. One of: Returns:
A dict with the case JSON (id, trap_family, question,
expected_answer, forbidden_answers, metadata) and optionally
the base64-encoded PDF bytes under |
| eval_faithfulnessA | Evaluate whether an LLM output is grounded in the retrieved context. Uses multivon-eval's QAG-graded Faithfulness evaluator. Extracts factual claims from the output and verifies each one against the context. Score is the fraction of claims supported. Use this when a RAG pipeline returned an answer and you want to check the LLM didn't invent facts not present in retrieved documents. Args:
input: The user's question.
context: The retrieved context the LLM was given.
output: The LLM's answer being evaluated.
judge_model: Provider:model for the QAG judge.
Default Returns:
|
| eval_hallucinationB | Detect fabricated information not present in the context. Score 1.0 = no hallucination. Score 0.0 = significant hallucination. Args: output: The LLM output to check. context: The ground-truth context the output should be grounded in. judge_model: Provider:model for the QAG judge. Returns:
|
| eval_relevanceA | Check whether an LLM output actually addresses the user's question. QAG-graded — generates yes/no questions about whether the output answers the input, stays on topic, contains relevant content. Args: input: The user's question. output: The LLM's response. judge_model: Provider:model for the QAG judge. Returns:
|
| eval_tool_call_accuracyA | Evaluate whether an agent called the right tool with the right arguments. Pure deterministic — no LLM judge needed. Compares the actual tool name + arguments against expected. Args: expected_tool: Tool name the agent should have called. actual_tool: Tool name the agent actually called. expected_arguments: Dict of expected argument values (optional). actual_arguments: Dict of argument values the agent passed (optional). Returns:
|
| eval_answer_accuracyA | Evaluate whether an answer is semantically equivalent to the ground truth. QAG-graded — generates yes/no questions about whether the actual answer matches the meaning of the expected answer. Useful when string match is too strict (e.g. paraphrased correct answers). Args: expected_answer: Ground-truth answer. actual_answer: The LLM's answer. judge_model: Provider:model for the QAG judge. Returns:
|
| eval_audit_packA | Build a hash-chained audit ZIP from a pdfhell run. Combines the run JSON, the case PDFs + answer keys, JUnit XML, and a SHA-256 manifest into one downloadable ZIP. Suitable for attaching to a procurement diligence appendix. Args:
run_json_path: Path to a pdfhell run JSON (from Returns:
|
| eval_discoverA | Return the full machine-readable capability catalog. Useful as a first call at session start — an agent can plan its evaluation strategy against the actual available evaluators rather than guessing or hallucinating tool names. Returns: A dict with three top-level keys: |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/multivon-ai/multivon-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server