Gemina FileTag

by co.gemina

Ownership verified

Server Details

Tag, rename, and enrich PDFs and images. Free tier: 1,500 tags/month, no credit card.

Status: Healthy
Last Tested: 2026-07-25 05:35
Transport: Streamable HTTP
URL
Repository: tommyil/gemina-mcp
GitHub Stars: 1
Server Listing: Gemina FileTag

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A4.2/5.0

Tool DescriptionsA

Average 4.4/5 across 12 of 12 tools scored. Lowest: 3.5/5.

Server CoherenceA

Disambiguation5/5

Each tool has a clearly distinct purpose. The two tag tools differ by input (local file vs URL), extraction and tagging are complementary, and query vs aggregation serve different analytical needs. No ambiguity between tools.

Naming Consistency5/5

Tool names follow a consistent verb_noun pattern, with get_/list_ for retrieval and verb_noun for actions (extract_document, tag_file, etc.). No mixing of conventions, making predictions easy.

Tool Count5/5

12 tools is well within the optimal range for a focused server. Each tool covers a distinct step in the file processing pipeline without bloat.

Completeness4/5

The tool surface covers the full lifecycle from upload to tagging, extraction, search, aggregation, and feedback. Missing deletion or update tools, but these are not essential for the core FileTag pipeline.

Available Tools

13 tools

aggregate_documentsAInspect

Compute sums/averages/min/max/counts over the tenant's indexed documents, optionally grouped (vendor_name, currency, document_type, expense_type, payment_method, end_user_id, month, year) and filtered (same filters as query_documents). Example: total spent per vendor in Q3 = metrics [{'op':'sum','field':'total_amount'}], group_by ['vendor_name'], filters {issueDateFrom, issueDateTo}. NOTE: money metrics are always split per currency unless a currency filter is given (mixed-currency totals would be meaningless); the response meta flags when that grouping was added automatically.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`filters`	No	Structured filters (camelCase keys).
`metrics`	Yes	List of {'op': sum\|avg\|min\|max\|count, 'field': total_amount\|net_amount\|vat_amount (omit for count)}.
`group_by`	No	Grouping fields (see tool description).

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses key behavioral traits: automatic currency splitting (noted as 'always split per currency unless a currency filter is given') and that the response meta flags this. This goes well beyond the schema by clarifying edge cases and implicit behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two paragraphs) with a clear structure: purpose, optional parameters, example, and a critical note. Every sentence adds value without redundancy. It is front-loaded with the main purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and presence of an output schema, the description covers all necessary input details (metrics, group_by, filters), includes an example, and explains an automatic behavior. It does not need to describe output since an output schema exists.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is high (75%), but the description adds significant value by explaining the metrics format (list of objects with op and field) and group_by options with concrete examples. The schema's group_by description is vague ('see tool description'), so the description fills that gap. limit is not elaborated, but defaults are in schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool computes aggregate metrics (sums, averages, etc.) over indexed documents, distinct from siblings like query_documents which retrieves raw data. The verb 'Compute' and resource 'documents' are specific, and the operations are enumerated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides context for when to use the tool (needing aggregates with optional grouping and filtering) and references query_documents for the same filter structure, implying a complementary role. However, it lacks explicit exclusion of alternatives or a direct comparison to sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

extract_documentAInspect

Run Core-OCR extraction on a previously uploaded file slot. The file_id comes from a prior files_create_upload call (upload the bytes to the returned signed URL first; each slot is single-use). Choose one or more extraction_types: 'ocr' (full text), 'invoice_headers', 'invoice_line_items', 'document_details_hebrew', 'document_line_items_hebrew', or 'custom_template' (requires a READY template_id). Extraction is asynchronous: the call returns within seconds with either the completed result (fast documents) or an IN_PROCESS status carrying meta.correlationId — poll get_extraction_result with that id until complete. Duplicate protection is opt-in: pass an external_id of your own to enable it (re-submitting the same external_id within the dedup window idempotently returns the prior result, or errors if the file or extraction types differ; allow_duplicate=true overrides). Without an external_id every extraction is billed as new. Optional advanced knobs mirror the REST API: model_type selects the extraction model, and thinking / evaluation / correction / include_coordinates toggle accuracy passes and coordinate output.

ParametersJSON Schema

Name	Required	Description
`file_id`	Yes	file_id from files_create_upload (slot must be uploaded and unconsumed).
`thinking`	No	Enable extended reasoning for higher accuracy on complex documents (slower).
`correction`	No	Run an automatic correction pass over the extracted fields.
`evaluation`	No	Run a self-evaluation pass over the extracted fields.
`model_type`	No	Extraction model to use: 'invictus', 'praetorian', or 'velox'. Omit to use the account default.
`end_user_id`	No	Optional end-user attribution for reporting.
`external_id`	No	Your idempotency key; enables duplicate detection.
`template_id`	No	UUID of a READY template; required with (and only valid with) 'custom_template'.
`allow_duplicate`	No	Force re-extraction when a duplicate external_id is detected.
`extraction_types`	Yes	One or more of: 'ocr', 'invoice_headers', 'invoice_line_items', 'document_details_hebrew', 'document_line_items_hebrew', 'custom_template'.
`include_coordinates`	No	Include bounding-box coordinates for each extracted field in the result.

Output Schema

ParametersJSON Schema

Name	Required	Description
`data`	Yes
`meta`	Yes
`errors`	No
`status`	Yes
`servedAt`	No
`createdAt`	No
`servedAtTimestamp`	No
`createdAtTimestamp`	No

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description fully bears the burden of disclosing behavior. It explains that extraction is asynchronous, that a correlationId is returned for polling, that slots are single-use, and that billing depends on external_id. It also lists optional advanced knobs (model_type, thinking, etc.) and their effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with front-loaded purpose and then details. It includes necessary information without being overly verbose, though some parts (e.g., the long list of extraction types) could be streamlined. Still, it earns a 4 for efficiency.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (11 parameters, asynchronous behavior, multiple extraction types, duplicate protection), the description is remarkably complete. It covers the workflow, parameter semantics, and behavior. The output schema exists but is not shown, so the description correctly avoids explaining return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Although the input schema describes all 11 parameters (100% coverage), the description adds significant meaning: it explains the file_id lifecycle, enumerates extraction_types options, clarifies the external_id/allow_duplicate interaction, and details the advanced parameters. This goes well beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action ('Run Core-OCR extraction on a previously uploaded file slot') and identifies the resource (a file slot). It distinguishes from sibling tools like 'files_create_upload' and 'get_extraction_result' by referencing the prior upload step and the polling workflow.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use the tool: after uploading a file to a slot, and after calling 'files_create_upload'. It covers usage flow, including polling with 'get_extraction_result', and explains the duplicate protection mechanism (external_id) and when to override it (allow_duplicate=true). It also mentions alternatives like 'get_extraction_result' for status checks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_create_extraction_uploadAInspect

Reserve a pre-signed PUT slot for Core-OCR extraction. After PUTing the bytes to the returned URL, follow the next_tool_call recipe and provide one or more extraction_types to extract_document. This is distinct from files_create_upload, which is the FileTag upload flow. Allowed types: PDF, PNG, JPEG, GIF, WebP. Max size 50 MB.

ParametersJSON Schema

Name	Required	Description
`sha256`	No	Optional lowercase-hex SHA-256 of the file.
`filename`	Yes	Original filename including extension (e.g. ``invoice.pdf``).
`mime_type`	Yes	MIME type: application/pdf, image/png, image/jpeg, image/gif, or image/webp.
`size_bytes`	Yes	Exact byte size of the file.

Output Schema

ParametersJSON Schema

Name	Required	Description
`fileId`	Yes
`upload`	Yes
`nextToolCall`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Discloses that it reserves a slot, returns a URL for PUT, and expects a subsequent 'extract_document' call. Could mention expiration or auth details for completeness.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise three sentences with front-loaded key action and clear step-by-step flow. No filler or redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists and schema coverage is complete, description effectively explains tool's role in the broader extraction flow. Could be slightly more detailed about the output format, but output schema compensates.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description does not add significant meaning beyond schema; allowed types and max size are already in schema. References downstream 'extraction_types' but not relevant to this tool's parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool reserves a pre-signed PUT slot for Core-OCR extraction, differentiates from sibling 'files_create_upload', and lists allowed file types and max size.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit usage context (Core-OCR flow) and distinguishes from FileTag upload flow. Includes follow-up instructions but lacks explicit when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

files_create_uploadAInspect

Reserve a pre-signed PUT slot for a forthcoming file upload. Returns the upload URL, an upload.headers dict (the agent MUST echo every header in this dict on the PUT -- today that is just Content-Type), the slot expiry (5 minutes), and a next_tool_call recipe pointing at tag_file -- copy-paste the file_id to run the FileTag pipeline against the uploaded bytes. This is the canonical path for any file the agent holds locally; bytes never traverse the LLM context (they go directly from the agent host to GCS). Allowed types: PDF, PNG, JPEG, GIF, WebP. Max size 50 MB.

ParametersJSON Schema

Name	Required	Description
`sha256`	No	Optional lowercase-hex SHA-256 of the file. When provided, the server verifies the uploaded bytes' hash before tagging.
`filename`	Yes	Original filename including extension (e.g. ``invoice.pdf``).
`mime_type`	Yes	MIME type of the file. Allowed: application/pdf, image/png, image/jpeg, image/gif, image/webp.
`size_bytes`	Yes	Exact byte size of the file. Server validates the uploaded blob against this declaration at ``tag_file`` time and rejects mismatches before consuming the slot.

Output Schema

ParametersJSON Schema

Name	Required	Description
`fileId`	Yes
`upload`	Yes
`nextToolCall`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description fully carries the burden. It discloses critical behavioral traits: pre-signed PUT, required header echoing, 5-minute expiry, direct GCS upload, and server-side validation at tag_file time for size mismatches.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single paragraph with key action leading. Every sentence adds value: workflow, constraints, and output details. No redundant or fluff content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (4 parameters, output schema), the description fully covers the workflow (reserve → upload with headers → tag_file), constraints (5 min expiry, 50 MB, allowed types), and output fields (upload URL, headers, expiry, next_tool_call).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, providing a baseline of 3. The description adds context: sha256 is optional for verification, filename includes extension, mime_type lists allowed values, and size_bytes is verified later. This enhances understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Reserve a pre-signed PUT slot for a forthcoming file upload,' specifying the verb (reserve) and resource (file upload slot). It distinguishes from siblings like 'files_create_extraction_upload' and identifies the next step ('tag_file').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states this is 'the canonical path for any file the agent holds locally' and that 'bytes never traverse the LLM context.' It mentions allowed types, max size, and the next stage (tag_file). While it doesn't explicitly list alternative tools for when not to use, the sibling context is sufficient for differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_documentAInspect

Fetch one document by its id, including all of its extractions. Only documents created by the calling API key are visible.

ParametersJSON Schema

Name	Required	Description	Default
`document_id`	Yes	UUID of the document.

Output Schema

ParametersJSON Schema

Name	Required	Description
`data`	Yes
`meta`	Yes
`errors`	No
`status`	Yes
`createdAt`	No
`createdAtTimestamp`	No

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses visibility restriction ('Only documents created by the calling API key are visible') and return content ('including all of its extractions'). Without annotations, this provides moderate transparency, but lacks details on error handling, rate limits, or authentication beyond implicit API key usage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two focused sentences with no unnecessary words. First sentence states action and scope, second adds critical access control context. Front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, the description covers retrieval by ID, included extractions, and access restrictions. Output schema handles return value details, so no further information is needed for competent usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the schema already describes the parameter (UUID). The description adds value by clarifying that the tool returns the document and its extractions, and that visibility is scoped to the caller's documents. This enriches the parameter's meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Fetch one document by its id' specifying the verb and resource. Distinguishes from siblings like get_extraction and query_documents by focusing on single document retrieval, including extractions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies use when a single document ID is known, but does not explicitly state when to use this tool versus alternatives like query_documents or aggregate_documents. No exclusion conditions or alternative suggestions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_extractionAInspect

Fetch one extraction by its id (from list_extractions or a completed extract_document result), including the full extracted data. Only extractions created by the calling API key are visible.

ParametersJSON Schema

Name	Required	Description	Default
`extraction_id`	Yes	UUID of the document extraction.

Output Schema

ParametersJSON Schema

Name	Required	Description
`meta`	Yes
`errors`	No
`status`	Yes
`values`	Yes
`document`	Yes
`createdAt`	No
`createdAtTimestamp`	No

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It mentions access control (only visible to creator) and that full extracted data is returned, but does not disclose idempotency, error behavior (e.g., if id not found), or any side effects. This is adequate but leaves gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the main action and includes critical context. It is concise without wasted words. However, it could be structured more clearly (e.g., separating constraints from functionality).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple interface (one parameter, output schema exists), the description covers the main purpose, access constraint, and data content. It omits error details but is sufficient for a retrieval tool with a defined output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the schema already describes extraction_id as a UUID. The description adds context by specifying the source of the ID (from list_extractions or extract_document), which is helpful but not substantial. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Fetch' and resource 'extraction', and specifies the id source (list_extractions or extract_document). However, it does not differentiate from get_extraction_result, a sibling tool with a similar name, so the distinction is unclear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains that only extractions created by the calling API key are visible, which is a key usage constraint. However, it lacks explicit guidance on when to use this tool versus alternatives like get_extraction_result, and does not state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_extraction_resultAInspect

Poll for the result of an asynchronous extract_document call. Pass the meta.correlationId from the extract response. Returns the completed extraction result once processing finishes, or an IN_PROCESS status while it is still running — poll again after a few seconds. Only correlations created by the calling API key are visible.

ParametersJSON Schema

Name	Required	Description	Default
`correlation_id`	Yes	meta.correlationId returned by extract_document.

Output Schema

ParametersJSON Schema

Name	Required	Description
`data`	Yes
`meta`	Yes
`errors`	No
`status`	Yes
`servedAt`	No
`createdAt`	No
`servedAtTimestamp`	No
`createdAtTimestamp`	No

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses polling behavior, status values like IN_PROCESS, and access restrictions. Could mention rate limits or expected polling intervals.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, front-loaded with purpose, and each sentence provides necessary information without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, the description sufficiently covers polling behavior, status, and access. Could include error states or typical processing time for completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and description adds specific context: 'meta.correlationId from the extract response', which aids correct invocation beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool polls for async extraction results, references the specific extract_document call and correlationId, and distinguishes from other tools in the sibling list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explains when to use (after extract_document, polling) and that only the calling API key can see results. It lacks explicit alternatives but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

index_documentAInspect

(Re)index one of the tenant's documents into the searchable index — use after corrections, or to backfill a document processed before indexing was enabled. Indexing normally happens automatically on every extraction once the tenant enables document indexing; this tool is the manual trigger. Returns per-outcome counts (indexed / skipped_opt_out / skipped_state / skipped_no_fields / skipped_not_in_plan / skipped_no_credits).

ParametersJSON Schema

Name	Required	Description	Default
`document_id`	Yes	UUID of a document owned by the calling tenant.

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that the tool reindexes and returns per-outcome counts, including skip reasons. However, it does not explicitly state if the operation is destructive or if it requires any special permissions, leaving some gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the purpose and usage, then adding context about automatic indexing and return values. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has only one parameter and an output described in the description, the explanation is complete. It covers what the tool does, why it exists, and what to expect as output. No missing information for effective usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema already provides a clear description for the sole parameter ('UUID of a document owned by the calling tenant'). The description adds context ('(Re)index one of the tenant's documents') and usage scenarios, enhancing understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's verb ('index') and resource ('one of the tenant's documents'), and distinguishes it from automatic indexing by calling it a 'manual trigger'. No ambiguity with sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit use cases are given ('after corrections, or to backfill') and context that indexing normally happens automatically. No explicit alternatives or when-not-to-use, but the manual vs automatic distinction provides clear guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_extractionsAInspect

List the tenant's past document extractions, newest first. Filter by external_id (the idempotency key passed to extract_document), end_user_id, and/or an ISO date window (from_date/to_date, YYYY-MM-DD). Paginate with skip/limit. Returns extraction summaries — fetch full extracted data for one item with get_extraction.

ParametersJSON Schema

Name	Required	Description
`skip`	No
`limit`	No
`to_date`	No	Return items created on/before this ISO date (YYYY-MM-DD).
`from_date`	No	Return items created on/after this ISO date (YYYY-MM-DD).
`end_user_id`	No	Filter by end-user id.
`external_id`	No	Filter by external identifier.

Output Schema

ParametersJSON Schema

Name	Required	Description
`data`	Yes
`meta`	Yes
`errors`	No
`status`	Yes
`servedAt`	No
`createdAt`	No
`servedAtTimestamp`	No
`createdAtTimestamp`	No

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the burden. It discloses that results are newest first, returns summaries, and mentions pagination and filtering. It does not mention side effects, rate limits, or authentication, but these are likely unnecessary for a list operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with the main purpose, followed by filter and pagination details, then a reference to a sibling tool. No extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown), the description does not need to explain return values. It covers all operational aspects: ordering, filtering, pagination, and relation to a sibling tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 67% (4 of 6 parameters have descriptions). The description adds value by explaining that 'external_id' is the idempotency key, date format is YYYY-MM-DD, and pagination uses skip/limit.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'List', the resource 'document extractions', and the ordering 'newest first'. It distinguishes from sibling tools like 'get_extraction' by noting that this returns summaries and the sibling fetches full data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit filtering options and pagination, and tells the user to use 'get_extraction' for full extracted data. It does not explicitly mention when not to use this tool, but it implies the appropriate context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

query_documentsAInspect

Search the tenant's indexed document collection. Modes: 'structured' (exact filters over extracted fields: vendorName, docNumber, documentType, currency, issueDateFrom/To, totalAmountMin/Max, endUserId, ...), 'semantic' (natural-language similarity over document summaries), 'hybrid' (keyword + semantic fused with Reciprocal Rank Fusion — best default for free-text questions). Returns matched documents with their extracted fields and scores. Only documents of tenants that enabled document indexing appear.

ParametersJSON Schema

Name	Required	Description	Default
`mode`	No	'structured' \| 'semantic' \| 'hybrid'	structured
`skip`	No
`text`	No	Natural-language query (required for semantic/hybrid).
`limit`	No
`top_k`	No
`filters`	No	Structured filters (RetrievalFiltersDTO shape, camelCase keys).

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, description carries full burden. It discloses that only indexed tenants' documents appear, output includes matched docs with fields and scores. Does not mention rate limits or destructive behavior, but as a read-only search, the key constraints are covered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One focused paragraph, front-loaded with purpose, then modes, output, and limitation. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Comprehensive for a search tool: explains modes, output, and indexing prerequisite. Output schema exists, so return details are covered. Lacks explicit error handling or pagination depth, but sufficient for agent selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 50%. Description adds meaning for mode (explains each) and indirectly for text, but skip, limit, top_k lack description in both schema and description. Filters and mode descriptions partly compensate but not fully.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool searches an indexed document collection, with specific verb 'Search' and resource. It distinguishes from sibling tools like aggregate_documents, index_document, and tagging tools by focusing on retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides guidance on when to use each mode (structured, semantic, hybrid) and recommends hybrid as default for free-text questions. Does not explicitly contrast with siblings, but the purpose is clearly search-oriented, so usage context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

submit_extraction_feedbackAInspect

Submit verified/corrected field values for a completed extraction — the extraction-quality feedback loop. data keys use the label:<human label>|ptr:/<json pointer> format addressing fields of the extraction result, e.g. {"label:Total Amount|ptr:/totalAmount": "118.00", "label:Vendor Name|ptr:/vendorName": "ACME Ltd"}. Returns a per-field comparison summary (correct/incorrect/missing counts). Each extraction accepts feedback once.

ParametersJSON Schema

Name	Required	Description	Default
`data`	Yes	Mapping of 'label:<label>\|ptr:/<pointer>' field keys -> verified/corrected values.
`extraction_id`	Yes	UUID of the document extraction to validate.

Output Schema

ParametersJSON Schema

Name	Required	Description
`data`	Yes
`meta`	Yes
`errors`	No
`status`	Yes
`servedAt`	No
`createdAt`	No
`servedAtTimestamp`	No
`createdAtTimestamp`	No

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full behavioral disclosure. It specifies the data key format in detail, explains the return value (per-field comparison summary), and notes the one-time feedback constraint. This is good, though it could add information on error handling or idempotency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, consisting of two sentences plus a code example. Every sentence adds value: purpose, data format, constraint, and return summary. No extraneous content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (2 required params, output schema present), the description covers the essential aspects: purpose, parameter format, usage constraint (once per extraction), and return value. It is mostly complete, though could mention what happens on duplicate submissions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds significant value beyond the schema by providing the exact format for data keys (e.g., 'label:Total Amount|ptr:/totalAmount') and an example. This clarifies how to construct the input, surpassing the schema's generic description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to submit verified/corrected field values for a completed extraction, establishing the extraction-quality feedback loop. It uses a specific verb (submit) and resource (field values), and the context distinguishes it from sibling tools like get_extraction, which only retrieve data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly mentions a key constraint: each extraction accepts feedback once. This provides clear context on when to use the tool and its limitation. However, it does not mention when not to use it or compare to other tools, though no direct alternative exists among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tag_fileAInspect

Run the FileTag pipeline against a previously uploaded slot. The file_id comes from a prior files_create_upload call. The server validates the uploaded blob (size, content-type, optional SHA-256), atomically consumes the slot, runs the FileTag extraction (renaming + metadata embedding), and returns the structured result with the extracted metadata, the suggested filename, the enriched_file_url (short-lived signed URL to the renamed copy with metadata embedded into document properties), and a next_action recipe (http_get_and_save) telling the agent to download that URL and save it as the suggested filename -- act on it unless the user explicitly asked for metadata only. Each slot is single-use; reserve a new slot with files_create_upload to retry.

ParametersJSON Schema

Name	Required	Description
`file_id`	Yes	The ``file_id`` returned by a prior ``files_create_upload`` call, after the agent has completed the PUT to the signed URL.
`end_user_id`	No	Optional end-user identifier (the entity the file belongs to).
`external_id`	No	Optional caller-supplied identifier echoed in the result.

Output Schema

ParametersJSON Schema

Name	Required	Description
`metadata`	Yes
`documentId`	Yes
`nextAction`	Yes	Structured copy-paste recipe for the post-tag follow-up: HTTP GET ``enriched_file_url`` and save as ``suggested_filename``. Mirrors the ``next_tool_call`` pattern from ``files_create_upload`` -- one structured instruction per step keeps multi-step flows reliable across volatile agent scratchpads.
`enrichedFileUrl`	Yes	Short-lived signed URL to the input file with the extracted metadata embedded directly into the document properties (XMP for PDFs, EXIF/XMP for images). This is the canonical enriched output of the tool -- the file the user actually wants. Agents should download it within ``enriched_file_expires_in_seconds`` and save it under ``suggested_filename`` rather than embedding metadata client-side. Skip the download only if the user explicitly asked for the metadata payload alone.
`filenamePatterns`	Yes
`suggestedFilename`	Yes	Recommended filename for the enriched file -- one of the six ``filename_patterns``, picked as the safest default for general use. Pair this with ``enriched_file_url`` when saving the downloaded file.
`documentExtractionId`	Yes
`enrichedFileExpiresAt`	Yes	Absolute UTC timestamp when ``enriched_file_url`` stops resolving.
`enrichedFileExpiresInSeconds`	Yes	Seconds until ``enriched_file_url`` expires, captured at response time. Easier for agents to reason about than the absolute timestamp.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It details validation (size, content-type, SHA-256), atomic consumption, extraction pipeline, return fields, and single-use slot. Could mention error conditions but is otherwise thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with main action, every sentence contains critical information: purpose, process, and actionable instruction. No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and an output schema likely referenced, the description covers the full workflow: input, validation, processing, output fields, and post-action. Single-use note prevents misuse.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters. Description adds value for file_id by explaining its source and prerequisite (after PUT). For end_user_id and external_id, description repeats schema info but reinforces optionality.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool runs the FileTag pipeline against a previously uploaded slot, specifying the source of file_id and distinguishing it from siblings like files_create_upload.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context: file_id from prior upload, validation process, and a next_action instruction. Lacks explicit when-not-to-use or alternatives, but context is sufficient for correct invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tag_urlAInspect

Fetch a remote URL server-side and run the FileTag pipeline. The bytes never traverse the LLM context -- the agent supplies the URL, the server fetches under strict SSRF guards (HTTPS only, no private IP ranges, 30-second timeout, 50 MB cap, redirects disabled), and returns the structured tag result with metadata, suggested filename, enriched_file_url (short-lived signed URL to the renamed copy with metadata embedded into document properties), and a next_action recipe (http_get_and_save) telling the agent to download that URL and save it as the suggested filename -- act on it unless the user explicitly asked for metadata only. Use this when the file already lives at a public URL.

ParametersJSON Schema

Name	Required	Description
`url`	Yes	Public HTTPS URL pointing at the file. Hostname must NOT resolve to a private/reserved address. Redirects are NOT followed -- agents must resolve them before invoking.
`end_user_id`	No	Optional end-user identifier.
`external_id`	No	Optional caller-supplied identifier echoed in the result.

Output Schema

ParametersJSON Schema

Name	Required	Description
`metadata`	Yes
`documentId`	Yes
`nextAction`	Yes	Structured copy-paste recipe for the post-tag follow-up: HTTP GET ``enriched_file_url`` and save as ``suggested_filename``. Mirrors the ``next_tool_call`` pattern from ``files_create_upload`` -- one structured instruction per step keeps multi-step flows reliable across volatile agent scratchpads.
`enrichedFileUrl`	Yes	Short-lived signed URL to the input file with the extracted metadata embedded directly into the document properties (XMP for PDFs, EXIF/XMP for images). This is the canonical enriched output of the tool -- the file the user actually wants. Agents should download it within ``enriched_file_expires_in_seconds`` and save it under ``suggested_filename`` rather than embedding metadata client-side. Skip the download only if the user explicitly asked for the metadata payload alone.
`filenamePatterns`	Yes
`suggestedFilename`	Yes	Recommended filename for the enriched file -- one of the six ``filename_patterns``, picked as the safest default for general use. Pair this with ``enriched_file_url`` when saving the downloaded file.
`documentExtractionId`	Yes
`enrichedFileExpiresAt`	Yes	Absolute UTC timestamp when ``enriched_file_url`` stops resolving.
`enrichedFileExpiresInSeconds`	Yes	Seconds until ``enriched_file_url`` expires, captured at response time. Easier for agents to reason about than the absolute timestamp.

Tool Definition Quality

A5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses behavioral traits: SSRF guards (HTTPS only, no private IPs, 30s timeout, 50MB cap, redirects disabled), return structure including next_action recipe, and the instruction to act on the result unless user asked for metadata only.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, dense paragraph that front-loads the core purpose, then efficiently covers constraints, return value, and usage guidance. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (server-side fetch, SSRF guards, return structure with enriched_file_url and next_action), the description is complete. It covers all necessary information for an agent to correctly select and invoke the tool, even without seeing the output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% but the description adds critical context beyond schema: URL must be public HTTPS, redirects not followed, and agents must resolve them. It also explains the optional parameters' usage in the broader context of the pipeline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches a remote URL and runs the FileTag pipeline server-side. It distinguishes from siblings like 'tag_file' by emphasizing that this tool is for files at public URLs and that bytes never traverse LLM context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use this when the file already lives at a public URL.' Also provides guidance on when not to act (if user asked for metadata only) and mentions alternative behaviors like resolving redirects before invoking.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Gemina FileTag

Server Details

Tool Definition Quality

Available Tools

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Discussions

Your Connectors

Resources