| talonic_list_schemasA | STATUS: stable. List all saved schemas in the user's Talonic workspace.
Returns each schema with its id (UUID), short_id (SCH-XXXXXXXX), name, description,
version, field count, and full JSON Schema definition. Either id form is accepted by
talonic_extract's schema_id parameter. USE WHEN: The user asks what schemas they have, or asks to see existing schemas. You want to discover existing schemas before designing a new one. Before recommending the user create a schema, check if one already covers the use case. The user asks to extract from a known document type and you want to find a matching schema.
DO NOT USE WHEN: TIP: Pair this with talonic_extract by passing the chosen schema's id as schema_id. |
| talonic_save_schemaA | STATUS: stable. Save a schema definition to the user's Talonic workspace so it can be reused
across future extractions. Returns the saved schema with its newly assigned id and short_id. USE WHEN: The user asks to save a schema, store a template, or reuse the schema across docs. You have iterated on a schema with the user and they confirmed it should be saved. The user wants to standardise extraction across many documents of the same type.
DO NOT USE WHEN: The user just wants to extract once with an inline schema (call talonic_extract directly with the schema inline). The user has not confirmed the schema design (avoid creating clutter in their workspace).
DEFINITION FORMATS: JSON Schema (most reliable): { type: "object", properties: { vendor_name: { type: "string" } } } Flat key-type map: { vendor_name: "string", invoice_total: "number" } -- API normalises server-side. If you get a "no fields" error from the API, fall back to JSON Schema.
TIP: After saving, call talonic_extract with schema_id set to the returned id (UUID or SCH- short id) for consistent results. |
| talonic_get_documentA | STATUS: stable. Fetch full metadata for a single document already in the user's Talonic workspace.
Returns id, filename, page count, detected document type, language, processing log,
and link URLs (self, extractions, dashboard). USE WHEN: You need details about a specific document the user already extracted or uploaded. You have a document_id from a previous extract or search call and want more context. The user asks 'tell me about document X' or similar.
DO NOT USE WHEN: The user wants the document's full text content (use talonic_to_markdown for OCR markdown). The user wants extracted structured data (use talonic_extract with a schema, or fetch the extraction by id). The user has a file but no document_id yet (call talonic_extract first to ingest the document).
|
| talonic_searchA | STATUS: stable. Search the user's Talonic workspace for documents, fields, sources, or schemas
matching a query. Returns ranked results across all entity types in one call. USE WHEN: The user wants to find documents but does not know the exact filename or id. The query is conceptual ('contracts mentioning indemnification', 'Acme invoices'). You need to narrow a large workspace before calling talonic_extract or talonic_filter. The user asks 'do I have any docs about X' or 'find anything related to X'.
DO NOT USE WHEN: The user has a specific document_id (use talonic_get_document instead). The user wants to apply structured field-value filters like 'amount > 1000' (use talonic_filter). The user wants to extract data from a brand-new document (use talonic_extract).
TIP: The result includes documents, fieldMatches, sources, schemas, and fields.
Both fields[] and fieldMatches[] include a filterable boolean. Only entries with
filterable: true can be used with talonic_filter. Fields with filterable: false exist
in a schema but have no extracted data yet. Pick the entity type the user actually needs. |
| talonic_filterA | STATUS: stable. Field-name resolution is server-side. Filter the user's Talonic documents by extracted field values using composable conditions.
Conditions accept either a canonical field name (e.g. 'vendor.name', 'policy.0_coverage_type')
or a field UUID. The Talonic API resolves names to ids server-side. USE WHEN: The user wants documents matching specific structured criteria, like 'invoices over 1000 EUR'
or 'contracts expiring before 2026-12-31' or 'COIs from Acme'. The query is value-based on extracted fields, not a free-text concept search. You need to retrieve a sortable, paginated list filtered by field conditions.
DO NOT USE WHEN: The user wants conceptual / free-text search across content (use talonic_search). The user is looking for a single document by id (use talonic_get_document). The user wants extracted data from a new document (use talonic_extract).
OPERATORS: eq, neq: equality / inequality. gt, gte, lt, lte: numeric or date comparisons. between: requires both value and value_to. contains: substring match on string fields. is_empty: presence check, no value needed. Returns documents where the field is null or missing. is_not_empty: presence check, no value needed. Returns documents where the field has a materialized value. Results reflect data within seconds of extraction completing.
SCHEMA TYPING: Numeric operators (gt, gte, lt, lte, between) only resolve correctly when the schema field is typed as number. A field typed as string that holds numeric content (e.g. '€1,500.00') will silently return zero matches even after extraction. Pick the right type at schema design time. If the response contains a warnings array, surface its message (and suggestion, if present) to the user verbatim — these explain why a query returned zero or unexpected results and typically suggest a schema-design change (e.g. switching a field's data_type from string to number) that will make subsequent filter calls work correctly. Do not silently retry without flagging the warning.
TIPS: To discover available field names, call talonic_search first with a related query.
Only use fields[] entries where filterable is true — their canonicalName is what to
pass as field here. Fields with filterable: false have no extracted data yet. fieldMatches[].resolvedFieldId is only valid when filterable is true. Entries with
filterable: false have resolvedFieldId: null and cannot be used for filtering. Both field (name) and field_id (UUID) reach the API as fieldId. Either is fine.
|
| talonic_to_markdownA | STATUS: stable. Get the OCR-converted markdown for a document. Accepts an existing document_id,
raw file bytes (base64), a local file path, or a URL. When given a raw file, the
tool ingests it via extract first and then returns the markdown. USE WHEN: The user wants the full text content of a document for summarisation, translation, or analysis. A previous tool call returned a document_id and you want to inspect its content. The user asks 'what does the document say' or 'summarise this PDF' (you call this then summarise). The user has a raw PDF / scan / image and wants markdown directly without designing a schema first.
DO NOT USE WHEN: INPUTS (provide exactly EXACTLY ONE; never combine, e.g. do NOT pass both file_data and file_path): document_id: id of an already-ingested document. Cheapest path, one API call. file_data + filename: base64-encoded file bytes plus the original filename (with extension).
RECOMMENDED for local-stdio installs (Claude Desktop, Cursor, Cline, Continue, Cowork).
WARNING for hosted-MCP via Claude.ai connectors: Claude.ai imposes a hard size limit on
tool-call arguments (effectively under ~1KB), so file_data CANNOT carry a real PDF through
Claude.ai's pipeline. The bytes get truncated before reaching the MCP server. For files
larger than a trivial test, use file_url or document_id instead when running through
Claude.ai. Local stdio installs do NOT have this limit. file_path: local path to a document file. Only works if the MCP server has read access
to that path; in sandboxed chat clients use file_data instead. file_url: a URL the Talonic API will fetch directly. Use for documents already on the
public web. Best path for Claude.ai users dealing with files larger than the parameter cap.
|
| talonic_extractA | STATUS: stable. Production-safe when called with a schema. Schema-less extraction is disabled at the MCP layer. Extract structured, schema-validated data from a document using Talonic.
Returns clean JSON matching the schema, with per-field confidence scores and
metadata about the document (detected type, language, page count). USE WHEN: The user has a document (PDF, image, scan, DOCX, etc.) and wants specific fields pulled out. You need structured data (vendor name, total amount, dates, parties, terms) rather than free text. The user uploads or references any invoice, contract, certificate, statement, or form. You want validated JSON instead of trying to OCR + parse with raw LLM calls.
DO NOT USE WHEN: FILE SOURCES (provide exactly EXACTLY ONE; never combine, e.g. do NOT pass both file_data and file_path): file_data + filename: base64-encoded file bytes plus the original filename (with extension).
RECOMMENDED for local-stdio installs (Claude Desktop, Cursor, Cline, Continue, Cowork).
WARNING for hosted-MCP via Claude.ai connectors: Claude.ai imposes a hard size limit on
tool-call arguments (effectively under ~1KB), so file_data CANNOT carry a real PDF through
Claude.ai's pipeline. The bytes get truncated before reaching the MCP server. For files
larger than a trivial test, use file_url or document_id instead when running through
Claude.ai. Local stdio installs do NOT have this limit. file_path: a local path to the document. Only works if the MCP server process can read
that path on its own filesystem. Chat clients (Claude Desktop, Claude.ai, Cowork) store
user uploads in a sandbox the MCP server cannot access, so file_path is only useful when
the agent explicitly knows a path on the same machine as the MCP server. file_url: a URL the Talonic API will fetch directly. Use for documents already on the
public web. Best path for Claude.ai users dealing with files larger than the parameter cap. document_id: re-extract a document already in the workspace. Cheapest option when the
document is already uploaded via app.talonic.com or a previous extract call.
SCHEMA (REQUIRED, provide exactly one of schema or schema_id): JSON Schema (RECOMMENDED): { type: "object", properties: { vendor_name: { type: "string" } } }. Flat key-type map: { vendor_name: "string", invoice_total: "number" }. Accepted, but if you get a "no fields" error, fall back to JSON Schema. schema_id: id of a saved schema from talonic_list_schemas. Accepts UUID or SCH-XXXXXXXX short id.
Calls without schema or schema_id are rejected with a validation error before they hit the API,
to prevent unreliable schema-free extractions reaching production. RESPONSE SHAPE (key fields): data: the structured extracted JSON, shaped by your schema. confidence.overall: 0..1 confidence for the extraction as a whole. confidence.fields: per-field confidence map. Treat fields below ~0.7 as needing human review. document.id, document.filename, document.pages, document.type_detected, document.language_detected. extraction_id, request_id: stable identifiers for support and re-fetch. processing.duration_ms, processing.region: useful for debugging and capacity planning. markdown: present only when include_markdown: true. provenance: present only when include_provenance: true. Per-field source evidence:
{ field_name: { source_text, section, page } }. Useful for audit trails and citations.
Cost, EUR price, and remaining credit balance are not surfaced in v0.1 and may appear in a later version.
|
| talonic_get_balanceA | STATUS: stable. Read the user's current Talonic credit balance, EUR value, 30-day burn rate,
projected runway, tier, and next-tier-reset timestamp. Use this to make budget-
aware decisions before kicking off large batches or re-extractions. USE WHEN: The user asks how many credits or how much budget they have left. You are about to run a large or expensive operation (batch extract, re-extract
many documents) and want to confirm budget headroom first. The user asks how long their balance will last at the current rate.
DO NOT USE WHEN: The user just wants the per-call cost of a single extraction (that is already
on the talonic_extract response under cost). The user wants to top up credits (route them to the dashboard; auto top-up is
guarded by a separate scope).
RESPONSE SHAPE: balance_credits: current credit balance. balance_eur: current balance in EUR (rounded to two decimals). burn_rate_30d_credits: total credits consumed in the trailing 30 days. projected_runway_days: days of runway at the current 30-day average burn.
-1 indicates no consumption in the trailing window (cannot compute runway). tier: API tier of the workspace (free, pro, enterprise, etc.). tier_resets_at: ISO 8601 timestamp of the next monthly tier reset.
|