Expense Tracker MCP Server

Overview Schema Related Servers Score Discussions

import_receipt_from_pdf

Extract and parse receipt data from PDF files to automatically categorize expenses and store them in a database for tracking spending patterns.

Instructions

Import and parse a receipt from a PDF file.

This tool:

Extracts text from the PDF
Parses receipt metadata (store, date, totals)
Extracts line items with prices
Categorizes each item using hybrid approach (static rules + LLM)
Stores everything in SQLite database

Args: pdf_path: Absolute path to the PDF receipt file ctx: FastMCP context for logging and LLM access

Returns: Summary of imported receipt including store, date, item count, and category breakdown

Input Schema

TableJSON Schema

Name	Required	Description	Default
`pdf_path`	Yes	Absolute path to PDF receipt file

Output Schema

TableJSON Schema

Name	Required	Description	Default
No arguments

Implementation Reference

main.py:27-120 (handler)

Primary tool handler function. Decorated with @mcp.tool for registration. Implements PDF validation, parsing via pdf_parser, item categorization, database storage, and returns structured summary.

@mcp.tool
async def import_receipt_from_pdf(
    pdf_path: Annotated[str, "Absolute path to PDF receipt file"],
    ctx: Context,
) -> dict:
    """Import and parse a receipt from a PDF file.

    This tool:
    1. Extracts text from the PDF
    2. Parses receipt metadata (store, date, totals)
    3. Extracts line items with prices
    4. Categorizes each item using hybrid approach (static rules + LLM)
    5. Stores everything in SQLite database

    Args:
        pdf_path: Absolute path to the PDF receipt file
        ctx: FastMCP context for logging and LLM access

    Returns:
        Summary of imported receipt including store, date, item count, and category breakdown
    """
    try:
        await ctx.info(f"Starting import of receipt: {pdf_path}")

        # Validate path
        path = Path(pdf_path).expanduser().resolve()
        if not path.exists():
            raise ToolError(f"PDF file not found: {pdf_path}")

        if not path.suffix.lower() == ".pdf":
            raise ToolError(f"File must be a PDF: {pdf_path}")

        # Parse PDF
        await ctx.info("Extracting text from PDF...")
        receipt, raw_items = parse_pdf_receipt(path)

        await ctx.info(
            f"Parsed receipt: {receipt.store_name} on {receipt.purchase_date}"
        )
        await ctx.info(f"Found {len(raw_items)} line items")

        # Categorize items
        await ctx.info("Categorizing items...")
        categorized_items = []
        item_type_counts = {}

        for idx, item_dict in enumerate(raw_items):
            # Categorize using hybrid approach
            item_type = await categorize_item(item_dict["item_name"], ctx)

            # Create LineItem object
            line_item = LineItem(
                item_name_raw=item_dict["item_name"],
                item_type=item_type,
                quantity=item_dict["quantity"],
                line_total=item_dict["price"],
            )

            categorized_items.append(line_item)

            # Track category counts
            item_type_counts[item_type] = item_type_counts.get(item_type, 0) + 1

            await ctx.debug(
                f"  [{idx+1}/{len(raw_items)}] {item_dict['item_name']} -> {item_type}"
            )

        # Insert into database
        await ctx.info("Saving to database...")
        receipt_id = insert_receipt(receipt)
        insert_items(receipt_id, categorized_items)

        await ctx.info(f"Successfully imported receipt #{receipt_id}")

        # Return summary
        return {
            "status": "success",
            "receipt_id": receipt_id,
            "store_name": receipt.store_name,
            "purchase_date": receipt.purchase_date,
            "total": receipt.total,
            "items_count": len(categorized_items),
            "item_types": item_type_counts,
            "message": f"Successfully imported {len(categorized_items)} items from {receipt.store_name}",
        }

    except FileNotFoundError as e:
        raise ToolError(f"File not found: {str(e)}")
    except ValueError as e:
        raise ToolError(f"Failed to parse receipt: {str(e)}")
    except Exception as e:
        await ctx.error(f"Unexpected error during import: {e}")
        raise ToolError(f"Failed to import receipt: {str(e)}")

expense_tracker/models.py:7-54 (schema)

Dataclass models for Receipt and LineItem providing structured data validation and typing for parsed receipt data used throughout the tool.

@dataclass
class Receipt:
    """Represents a parsed receipt."""

    store_name: str
    purchase_date: str  # ISO format: YYYY-MM-DD
    total: float
    subtotal: Optional[float] = None
    tax: Optional[float] = None

    def __post_init__(self):
        """Validate receipt data."""
        if self.total <= 0:
            raise ValueError("Total must be positive")


@dataclass
class LineItem:
    """Represents a single item from a receipt."""

    item_name_raw: str
    item_type: str
    line_total: float
    quantity: float = 1.0
    unit_price: Optional[float] = None

    def __post_init__(self):
        """Calculate unit price if not provided."""
        if self.unit_price is None and self.quantity > 0:
            self.unit_price = self.line_total / self.quantity

        if self.line_total <= 0:
            raise ValueError("Line total must be positive")
        if self.quantity <= 0:
            raise ValueError("Quantity must be positive")


@dataclass
class ItemStats:
    """Statistics for a specific item type."""

    item_type: str
    total_purchases: int
    last_purchase_date: str
    first_purchase_date: str
    total_spent: float
    average_days_between: Optional[float] = None

expense_tracker/pdf_parser.py:268-290 (helper)

Key helper function that extracts text from PDF using pdfplumber, parses store/date/totals/line items using regex patterns, returns Receipt and raw items list.

def parse_pdf_receipt(pdf_path: Path) -> tuple[Receipt, list[dict]]:
    """Parse a PDF receipt file.

    Args:
        pdf_path: Path to PDF file

    Returns:
        Tuple of (Receipt object, list of item dicts)

    Raises:
        FileNotFoundError: If PDF doesn't exist
        ValueError: If parsing fails
    """
    # Convert string to Path if needed
    if isinstance(pdf_path, str):
        pdf_path = Path(pdf_path)

    # Extract text
    text = extract_text_from_pdf(pdf_path)

    # Parse receipt
    return parse_receipt(text)

expense_tracker/categorizer.py:221-243 (helper)

Hybrid item categorization helper: static regex/pattern matching first, LLM fallback via ctx.sample for unknown items.

async def categorize_item(item_name: str, ctx=None) -> str:
    """Main categorization function with hybrid approach.

    Args:
        item_name: Raw item name from receipt
        ctx: Optional FastMCP Context for LLM fallback

    Returns:
        item_type category (guaranteed to return a value)
    """
    # Try deterministic rules first
    category = deterministic_categorize(item_name)

    if category:
        return category

    # Fall back to LLM if context is available
    if ctx:
        return await llm_categorize(item_name, ctx)

    # Ultimate fallback
    return "other"

expense_tracker/database.py:81-135 (helper)

Database persistence helpers: insert_receipt creates receipt record, insert_items adds categorized line items with foreign key.

def insert_receipt(receipt: Receipt, db_path: Path = DEFAULT_DB_PATH) -> int:
    """Insert a receipt and return its ID."""
    conn = get_connection(db_path)

    try:
        cursor = conn.execute(
            """
            INSERT INTO receipts (store_name, purchase_date, subtotal, tax, total)
            VALUES (?, ?, ?, ?, ?)
        """,
            (
                receipt.store_name,
                receipt.purchase_date,
                receipt.subtotal,
                receipt.tax,
                receipt.total,
            ),
        )
        conn.commit()
        return cursor.lastrowid
    finally:
        conn.close()


def insert_items(
    receipt_id: int, items: list[LineItem], db_path: Path = DEFAULT_DB_PATH
) -> None:
    """Bulk insert items for a receipt."""
    if not items:
        return

    conn = get_connection(db_path)

    try:
        conn.executemany(
            """
            INSERT INTO items (receipt_id, item_name_raw, item_type, quantity, unit_price, line_total)
            VALUES (?, ?, ?, ?, ?, ?)
        """,
            [
                (
                    receipt_id,
                    item.item_name_raw,
                    item.item_type,
                    item.quantity,
                    item.unit_price,
                    item.line_total,
                )
                for item in items
            ],
        )
        conn.commit()
    finally:
        conn.close()

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by detailing the multi-step behavior (extraction, parsing, categorization, storage). It explains the hybrid categorization approach and mentions database storage, though it could add more about error handling, performance, or permissions needed for file access.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear opening sentence, bullet points for steps, and separate sections for Args and Returns. It's appropriately sized but could be slightly more concise by integrating the Args section into the main text since it repeats schema information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-step processing), no annotations, and the presence of an output schema, the description is fairly complete. It outlines the process and return values, though it could benefit from mentioning error cases or limitations (e.g., PDF quality requirements).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents the single parameter 'pdf_path'. The description adds minimal value beyond the schema by restating the parameter in the 'Args' section without providing additional context like file format constraints or examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Import and parse a receipt from a PDF file') and details the multi-step process (extract text, parse metadata, extract line items, categorize, store). It distinguishes itself from sibling tools like 'get_item_history' and 'list_item_types' by focusing on data ingestion rather than querying existing data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when needing to process PDF receipts, but lacks explicit guidance on when to use this tool versus alternatives (none mentioned) or any prerequisites. It doesn't specify scenarios where this tool is preferred or when it should be avoided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Sharan0402/expense-tracker-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server