| talonic_list_schemasA | List all saved schemas in the user's Talonic workspace.
Returns each schema with its id, name, description, version,
field count, and full JSON Schema definition. USE WHEN: The user asks what schemas they have, or asks to see existing schemas. You want to discover existing schemas before designing a new one. Before recommending the user create a schema, check if one already covers the use case. The user asks to extract from a known document type and you want to find a matching schema.
DO NOT USE WHEN: TIP: Pair this with talonic_extract by passing the chosen schema's id as schema_id. |
| talonic_save_schemaA | Save a schema definition to the user's Talonic workspace so it can be reused
across future extractions. Returns the saved schema with its newly assigned id and short_id. USE WHEN: The user asks to save a schema, store a template, or reuse the schema across docs. You have iterated on a schema with the user and they confirmed it should be saved. The user wants to standardise extraction across many documents of the same type.
DO NOT USE WHEN: The user just wants to extract once with an inline schema (call talonic_extract directly with the schema inline). The user has not confirmed the schema design (avoid creating clutter in their workspace).
DEFINITION FORMATS: JSON Schema (most reliable): { type: "object", properties: { vendor_name: { type: "string" } } } Flat key-type map: { vendor_name: "string", invoice_total: "number" } -- API normalises server-side. If you get a "no fields" error from the API, fall back to JSON Schema.
TIP: After saving, call talonic_extract with schema_id set to the returned id (UUID or SCH- short id) for consistent results. |
| talonic_get_documentA | Fetch full metadata for a single document already in the user's Talonic workspace.
Returns id, filename, page count, detected document type, language, processing log,
and link URLs (self, extractions, dashboard). USE WHEN: You need details about a specific document the user already extracted or uploaded. You have a document_id from a previous extract or search call and want more context. The user asks 'tell me about document X' or similar.
DO NOT USE WHEN: The user wants the document's full text content (use talonic_to_markdown for OCR markdown). The user wants extracted structured data (use talonic_extract with a schema, or fetch the extraction by id). The user has a file but no document_id yet (call talonic_extract first to ingest the document).
|
| talonic_searchA | Search the user's Talonic workspace for documents, fields, sources, or schemas
matching a query. Returns ranked results across all entity types in one call. USE WHEN: The user wants to find documents but does not know the exact filename or id. The query is conceptual ('contracts mentioning indemnification', 'Acme invoices'). You need to narrow a large workspace before calling talonic_extract or talonic_filter. The user asks 'do I have any docs about X' or 'find anything related to X'.
DO NOT USE WHEN: The user has a specific document_id (use talonic_get_document instead). The user wants to apply structured field-value filters like 'amount > 1000' (use talonic_filter). The user wants to extract data from a brand-new document (use talonic_extract).
TIP: The result includes documents, fieldMatches, sources, schemas, and fields. Pick the entity type the user actually needs. |
| talonic_filterA | Filter the user's Talonic documents by extracted field values using composable conditions.
Conditions accept either a canonical field name (e.g. 'vendor.name', 'policy.0_coverage_type')
or a field UUID. The Talonic API resolves names to ids server-side. USE WHEN: The user wants documents matching specific structured criteria, like 'invoices over 1000 EUR'
or 'contracts expiring before 2026-12-31' or 'COIs from Acme'. The query is value-based on extracted fields, not a free-text concept search. You need to retrieve a sortable, paginated list filtered by field conditions.
DO NOT USE WHEN: The user wants conceptual / free-text search across content (use talonic_search). The user is looking for a single document by id (use talonic_get_document). The user wants extracted data from a new document (use talonic_extract).
OPERATORS: eq, neq: equality / inequality gt, gte, lt, lte: numeric or date comparisons between: requires both value and value_to contains: substring match on string fields is_empty: presence check (no value needed) is_not_empty: presence check (no value needed). Note: currently underreports;
use eq / gt / contains etc. against a known value when possible.
TIPS: To discover available field names, call talonic_search first with a related query.
fields[].canonicalName from the response is what to pass as field here. Both field (name) and field_id (UUID) reach the API as fieldId. Either is fine.
|
| talonic_to_markdownA | Get the OCR-converted markdown for a document. Accepts an existing document_id,
raw file bytes (base64), a local file path, or a URL. When given a raw file, the
tool ingests it via extract first and then returns the markdown. USE WHEN: The user wants the full text content of a document for summarisation, translation, or analysis. A previous tool call returned a document_id and you want to inspect its content. The user asks 'what does the document say' or 'summarise this PDF' (you call this then summarise). The user has a raw PDF / scan / image and wants markdown directly without designing a schema first.
DO NOT USE WHEN: INPUTS (provide exactly one): document_id: id of an already-ingested document (cheapest path; one API call) file_data + filename (RECOMMENDED for chat clients): base64-encoded file bytes plus
the original filename (with extension). Use this whenever you already have the file
in memory, e.g. the user attached it to the conversation. Works in every MCP client. file_path: local path to a document file. Only works if the MCP server has read access
to that path; in sandboxed chat clients use file_data instead. file_url: URL to a document file (the Talonic API fetches it server-side)
|
| talonic_extractA | Extract structured, schema-validated data from a document using Talonic.
Returns clean JSON matching the schema, with per-field confidence scores and
metadata about the document (detected type, language, page count). USE WHEN: The user has a document (PDF, image, scan, DOCX, etc.) and wants specific fields pulled out. You need structured data (vendor name, total amount, dates, parties, terms) rather than free text. The user uploads or references any invoice, contract, certificate, statement, or form. You want validated JSON instead of trying to OCR + parse with raw LLM calls.
DO NOT USE WHEN: FILE SOURCES (provide exactly one): file_data + filename (RECOMMENDED for chat clients): base64-encoded file bytes plus
the original filename (with extension). Use this whenever you already have the file
in memory, e.g. the user attached it to the conversation. Works in every MCP client
regardless of where the file lives on disk. file_path: a local path to the document. Only works if the MCP server process can
read that path on its own filesystem; many chat clients (Claude Desktop, Cowork)
store user uploads in a sandbox the MCP server cannot access, in which case use
file_data instead. file_url: a URL the Talonic API will fetch directly. Use for documents already on
the public web. document_id: re-extract a document already in the workspace.
SCHEMA FORMATS (provide at most one of schema or schema_id): JSON Schema (most reliable): { type: "object", properties: { vendor_name: { type: "string" } } } Flat key-type map: { vendor_name: "string", invoice_total: "number" } -- API normalises server-side. If you get a "no fields" error, fall back to JSON Schema. schema_id: id of a saved schema from talonic_list_schemas. Accepts the UUID or the SCH-XXXXXXXX short id.
IMPORTANT: production currently rejects requests with no schema. Always provide either
an inline schema or a schema_id. |