Skip to main content
Glama
multivon-ai

multivon-mcp

Official
by multivon-ai

eval_document_grounding

Verify if an LLM-generated answer about a multi-page document is factually grounded. Uses a vision judge to check claim support, no inventions, and exception handling across document pages.

Instructions

Check whether an answer about a multi-page document is grounded.

Document-page-grounded faithfulness for multi-page document agents (contracts, invoices, scientific PDFs, medical records). The vision judge answers three yes/no questions per document: is every claim supported, no inventions, exceptions handled.

Provide one image per page. Use exactly one of:

  • images: list of paths, http(s) URLs, or data URIs.

  • images_base64: list of raw base64 strings; pair with mime_type.

Args: input: The question or prompt the LLM was answering about the document. output: The LLM-generated answer to verify against the pages. images: List of page image sources (paths/URLs/data URIs). images_base64: Alternative — list of raw base64 strings. mime_type: Mime type when using images_base64. Default "image/png". judge_model: Provider:model for the vision judge. Must be vision-capable. Default "google:gemini-2.5-flash".

Returns: {"score": 0.0-1.0, "passed": bool, "reason": str, "threshold": float, "evaluator": "document_grounding"}.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
inputYes
outputYes
imagesNo
images_base64No
mime_typeNoimage/png
judge_modelNogoogle:gemini-2.5-flash

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full transparency burden. It explains the vision judge evaluates three criteria, returns a structured response, and requires images per page. It lacks disclosure on performance implications or auth requirements, but for an evaluation tool, this is acceptable.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded with purpose, then uses Args/Returns structure to detail parameters and output. It is efficient with no wasted words, though slightly more structured formatting (e.g., line breaks) could improve readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters, 0% schema coverage, and an output schema, the description covers all aspects: required inputs, optional parameters, defaults, and return format. It misses edge cases like handling both image options, but overall sufficient for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, but the description fully documents all 6 parameters: input, output, images (paths/URLs/data URIs), images_base64 (alternative), mime_type (default png), judge_model (default and vision capability requirement). It explains constraints like using exactly one of images or images_base64, adding meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks document grounding for multi-page documents, specifying it answers three yes/no questions via a vision judge. This distinguishes it from sibling evaluation tools like eval_faithfulness or eval_vqa_faithfulness by being explicitly document-page-grounded.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage guidelines: specify input and output, provide one image per page using exactly one of images or images_base64, and optionally set mime_type and judge_model. However, it does not explicitly contrast with alternative tools or state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/multivon-ai/multivon-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server