Skip to main content
Glama
haoxinm

pdf-chart-parser

by haoxinm

pdf-chart-parser

An MCP server and Python library that extracts energy-usage charts from utility-bill PDFs. It locates the chart, calibrates the axes from the PDF's text layer, and returns structured time-series data alongside an annotated PNG for visual verification — entirely deterministic, no LLM required.

Features

  • Bar, line, and hybrid (bar + line, dual y-axis) chart types

  • Vector-first extraction via PyMuPDF get_drawings() / get_text("dict"); OpenCV + OCR raster fallback for scanned PDFs

  • Full page text returned as LLM-friendly Markdown (via pymupdf4llm)

  • MCP tool (extract_usage_chart) compatible with Claude and other MCP-aware LLMs

  • Supports stdio transport (local) and streamable-http (containerized deployment)

  • Returns structured JSON + annotated PNG; numeric data is always text content

Related MCP server: MCP Invoice

Installation

Prerequisites

  • Python 3.12+

  • uv package manager

  • Tesseract OCR (required only for the [raster] extra): apt-get install tesseract-ocr or brew install tesseract

Quickstart

# Install (vector path only)
uv sync

# Install with raster/OCR fallback
uv sync --extra raster

# Run the CLI
uv run pdf-chart-parser --help

# Run the MCP server (stdio)
uv run python -m pdf_chart_parser.server

Usage

Python library

from pdf_chart_parser.pipeline import extract_usage_chart

result = extract_usage_chart(pdf_path="bill.pdf", return_annotated_image=True)
print(result["chart_type"])   # "bar" | "line" | "hybrid"
for series in result["series"]:
    print(series["label"], series["points"])

CLI

uv run pdf-chart-parser extract bill.pdf --output result.json

MCP server

Add to your MCP config (~/.claude/claude_desktop_config.json or similar):

{
  "mcpServers": {
    "pdf-chart-parser": {
      "command": "uv",
      "args": ["run", "python", "-m", "pdf_chart_parser.server"],
      "cwd": "/path/to/pdf-chart-parser"
    }
  }
}

The server exposes the extract_usage_chart tool. It returns:

  1. Page text — full page as Markdown

  2. Chart reading — structured JSON (series, axes, confidence, warnings)

  3. Annotated PNG — cropped chart with calibrated gridlines and data-point markers

Docker / ECR deployment

# Build and run locally
./scripts/run_local_server.sh

# Build and push to ECR (set ECR_REPO first)
export ECR_REPO=<account>.dkr.ecr.<region>.amazonaws.com/pdf-chart-parser
./scripts/build_and_push.sh

The container starts the server on streamable-http at port 8000. A local PDF directory can be bind-mounted to /data for ad-hoc testing (see docker/docker-compose.yml).

Manual testing (no LLM)

# In-process test against fixtures
uv run python scripts/run_manual_tests.py

# Against a running HTTP server
uv run python scripts/run_manual_tests.py --http http://localhost:8000

Output is written to manual_test_output/.

License

AGPL-3.0-or-later. This license is required because the project links against PyMuPDF, which is itself AGPL-3.0 licensed.

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/haoxinm/pdf-chart-parser'

If you have feedback or need assistance with the MCP directory API, please join our Discord server