invoice_parse
Extract structured JSON data from ZUGFeRD 2.x or XRechnung 3.x invoices. Accepts raw XML, base64 XML, or base64 PDF (hybrid) and returns a structured invoice data model.
Instructions
Extract structured data from a ZUGFeRD 2.x or XRechnung 3.x invoice. Accepts raw XML (CII or UBL), base64-encoded XML, or base64-encoded PDF (ZUGFeRD hybrid — the XML is extracted from the PDF/A-3 attachment). Returns a structured JSON object matching the invoice data model.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| xml_content | No | Raw XML string. | |
| xml_base64 | No | Base64-encoded XML. | |
| pdf_base64 | No | Base64-encoded PDF (ZUGFeRD hybrid). | |
| include_raw_xml | No |
Implementation Reference
- MCP handler for invoice_parse: validates input, resolves XML bytes (from raw/base64/PDF), detects syntax & profile, dispatches to CII or UBL parser, and returns structured output as JSON.
async def handle_invoice_parse(arguments: dict[str, Any]) -> list[types.TextContent]: """MCP handler for invoice_parse.""" try: params = InvoiceParseInput.model_validate(arguments) except Exception as exc: return [types.TextContent(type="text", text=json.dumps(format_error(str(exc))))] # Resolve input to XML bytes xml_bytes: bytes if params.pdf_base64 is not None: try: pdf_bytes = base64.b64decode(params.pdf_base64) xml_bytes = _extract_xml_from_pdf(pdf_bytes) except (NotImplementedError, EInvoicingError, Exception) as exc: return [types.TextContent(type="text", text=json.dumps(format_error(str(exc))))] else: try: xml_bytes = resolve_xml_input(params.xml_content, params.xml_base64) except (ValueError, EInvoicingError) as exc: return [types.TextContent(type="text", text=json.dumps(format_error(str(exc))))] try: syntax = detect_invoice_syntax(xml_bytes) profile = detect_zugferd_profile(xml_bytes) except ValueError as exc: return [types.TextContent(type="text", text=json.dumps(format_error(str(exc))))] if syntax == "CII" or syntax.value == "CII": invoice_data = _parse_cii_xml(xml_bytes) else: invoice_data = _parse_ubl_xml(xml_bytes) output = InvoiceParseOutput( profile=profile.name if profile else "UNKNOWN", syntax=syntax.value if hasattr(syntax, "value") else str(syntax), invoice_data=invoice_data, raw_xml=xml_bytes.decode("utf-8", errors="replace") if params.include_raw_xml else None, ) return [types.TextContent(type="text", text=output.model_dump_json(indent=2))] - Input schema — accepts raw XML, base64-encoded XML, or base64-encoded PDF, with an option to include raw XML in output.
class InvoiceParseInput(BaseModel): """Input schema for invoice_parse.""" xml_content: str | None = Field(None, description="Raw XML string.") xml_base64: str | None = Field(None, description="Base64-encoded XML bytes.") pdf_base64: str | None = Field( None, description=( "Base64-encoded PDF bytes. The tool will extract the embedded XML " "attachment (ZUGFeRD hybrid PDF/A-3)." ), ) include_raw_xml: bool = Field( False, description="Include the raw XML string in the response." ) - Output schema — profile, syntax, key summary fields, full parsed invoice_data, and optional raw XML.
class InvoiceParseOutput(BaseModel): """Output schema for invoice_parse.""" profile: str syntax: str invoice_number: str | None = None invoice_date: str | None = None seller_name: str | None = None buyer_name: str | None = None tax_inclusive_amount: str | None = None currency_code: str | None = None invoice_data: dict[str, Any] = Field( default_factory=dict, description="Full parsed invoice matching ZUGFeRDInvoice schema." ) raw_xml: str | None = None - mcp_einvoicing_de/tools/invoice_parse.py:64-86 (registration)Tool definition object TOOL_INVOICE_PARSE with name, description, and JSON Schema input schema (oneOf xml_content, xml_base64, pdf_base64).
TOOL_INVOICE_PARSE = types.Tool( name="invoice_parse", description=( "Extract structured data from a ZUGFeRD 2.x or XRechnung 3.x invoice. " "Accepts raw XML (CII or UBL), base64-encoded XML, or base64-encoded PDF " "(ZUGFeRD hybrid — the XML is extracted from the PDF/A-3 attachment). " "Returns a structured JSON object matching the invoice data model." ), inputSchema={ "type": "object", "properties": { "xml_content": {"type": "string", "description": "Raw XML string."}, "xml_base64": {"type": "string", "description": "Base64-encoded XML."}, "pdf_base64": {"type": "string", "description": "Base64-encoded PDF (ZUGFeRD hybrid)."}, "include_raw_xml": {"type": "boolean", "default": False}, }, "anyOf": [ {"required": ["xml_content"]}, {"required": ["xml_base64"]}, {"required": ["pdf_base64"]}, ], }, ) - mcp_einvoicing_de/server.py:18-43 (registration)Server registration: TOOL_INVOICE_PARSE imported and added to _ALL_TOOLS list, handle_invoice_parse mapped in _TOOL_HANDLERS dict.
from mcp_einvoicing_de.tools.invoice_parse import TOOL_INVOICE_PARSE, handle_invoice_parse from mcp_einvoicing_de.tools.invoice_validate import TOOL_INVOICE_VALIDATE, handle_invoice_validate from mcp_einvoicing_de.tools.peppol_check import TOOL_PEPPOL_CHECK, handle_peppol_check from mcp_einvoicing_de.tools.tax_rules import TOOL_TAX_RULES, handle_tax_rules LOG_LEVEL = os.environ.get("EINVOICING_DE_LOG_LEVEL", "INFO").upper() logging.basicConfig(level=getattr(logging, LOG_LEVEL, logging.INFO)) logger = logging.getLogger(__name__) _ALL_TOOLS: list[types.Tool] = [ TOOL_INVOICE_CREATE, TOOL_INVOICE_VALIDATE, TOOL_INVOICE_PARSE, TOOL_INVOICE_CONVERT, TOOL_PEPPOL_CHECK, TOOL_TAX_RULES, ] _TOOL_HANDLERS: dict[str, Any] = { "invoice_create": handle_invoice_create, "invoice_validate": handle_invoice_validate, "invoice_parse": handle_invoice_parse, "invoice_convert": handle_invoice_convert, "peppol_check": handle_peppol_check, "tax_rules": handle_tax_rules, }