How do I use document-parser?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@document-parser Extract text and tables from report.pdf" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

document-parser

by agenson-tools

Overview Schema Related Servers Score Discussions

JavaScript

Hybrid

Multi-Format Document Parser MCP Server

Smithery npm version License: MIT MCP Server

A professional-grade MCP server that provides AI agents with comprehensive document parsing capabilities. Built specifically for the agent economy by Agenson Horrowitz.

🤖 Why This Exists

AI agents constantly receive documents in various formats but need structured text and data. Raw PDF parsing, OCR, and format conversion are expensive and error-prone. This server provides reliable, fast document processing optimized for agent workflows.

Related MCP server: PDF Agent MCP

⚡ Key Features

Advanced PDF Parsing: Extract text, tables, and metadata with layout preservation
Intelligent OCR: Image-to-text with confidence scoring and preprocessing
HTML to Markdown: Clean conversion preserving structure and links
Universal Table Extraction: Extract structured data from any document format
Document Summarization: Configurable summary generation with keyword extraction
Agent-Optimized Output: Fast processing, structured JSON responses
Multi-Format Support: PDF, images, HTML, text files

🚀 Installation

Claude Desktop Configuration

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "document-parser": {
      "command": "npx",
      "args": ["@agenson-horrowitz/document-parser-mcp"]
    }
  }
}

Cline Configuration

Add to your Cline MCP settings:

{
  "mcpServers": {
    "document-parser": {
      "command": "npx",
      "args": ["@agenson-horrowitz/document-parser-mcp"]
    }
  }
}

Via npm

npm install -g @agenson-horrowitz/document-parser-mcp

Via MCPize (One-click deployment)

Deploy instantly on MCPize with built-in billing and authentication.

🛠️ Available Tools

1. `parse_pdf`

Extract comprehensive information from PDF documents.

Perfect for: Reports, invoices, contracts, research papers, forms

Features:

Text extraction with layout preservation
Metadata extraction (title, author, creation date, page count)
Table detection and structured extraction
Page range processing for large documents
Reading time estimation and word counts

Example:

{
  "file_path": "/path/to/document.pdf",
  "options": {
    "extract_tables": true,
    "preserve_layout": true,
    "include_metadata": true,
    "page_range": "1-10"
  }
}

2. `parse_image_text`

Perform high-quality OCR on images with confidence scoring.

Perfect for: Screenshots, scanned documents, photos of text, receipts

Features:

Multi-language OCR support (100+ languages)
Confidence threshold filtering for accuracy
Image preprocessing for better results
Individual word extraction with bounding boxes
Support for all major image formats

Example:

{
  "image_path": "/path/to/screenshot.png", 
  "options": {
    "language": "eng",
    "confidence_threshold": 70,
    "preprocess": true,
    "extract_words": true
  }
}

3. `html_to_markdown`

Convert HTML documents to clean, structured markdown.

Perfect for: Web pages, HTML emails, documentation, blog posts

Features:

Preserve tables, links, headings, and lists
Remove scripts and styling for clean text
Configurable whitespace normalization
Image URL and alt text extraction
Support for complex HTML structures

Example:

{
  "html_content": "<html>...</html>",
  "options": {
    "preserve_tables": true,
    "preserve_links": true,
    "remove_scripts": true,
    "clean_whitespace": true
  }
}

4. `extract_tables`

Extract structured table data from any document format.

Perfect for: Pricing lists, data reports, spreadsheets, forms

Features:

Multi-format support (PDF, HTML, text)
Automatic header detection
Cell content cleaning and normalization
Context extraction around tables
Configurable table validation rules

Example:

{
  "file_path": "/path/to/report.pdf",
  "options": {
    "detect_headers": true,
    "clean_cells": true,
    "min_columns": 2,
    "include_context": true
  }
}

5. `summarize_document`

Generate intelligent summaries of any document type.

Perfect for: Long reports, research papers, articles, documentation

Features:

Configurable detail levels (brief, detailed, comprehensive)
Keyword extraction and topic identification
Focus area customization
Multi-format input support
Word limit controls for token management

Example:

{
  "file_path": "/path/to/research.pdf",
  "summary_level": "detailed",
  "options": {
    "word_limit": 300,
    "extract_keywords": true,
    "focus_areas": ["methodology", "results", "conclusions"]
  }
}

💰 Pricing

Free Tier

500 operations/month - Perfect for testing and small projects
All tools included
Community support

Pro Tier - $9/month

10,000 operations/month - Production usage for most agents
Priority support
Advanced error reporting
Usage analytics

Scale Tier - $29/month

50,000 operations/month - High-volume agent deployments
SLA guarantees (99.5% uptime)
Custom rate limits
Direct technical support

Overage pricing: $0.02 per operation beyond your plan limits

🔐 Authentication & Payment

MCPize (Easiest)

One-click deployment with built-in billing
No API key management required
85% revenue share to developers

Direct API Access

Get API keys at agensonhorrowitz.cc
Stripe-powered metered billing
Real-time usage tracking

Crypto Micropayments

Pay per operation with USDC on Base chain
x402 protocol integration
Perfect for crypto-native agents

📊 Performance

Average processing time: < 3 seconds for typical documents
Uptime SLA: 99.5% (Scale tier)
Rate limits: 5 operations/second (configurable)
File size limits: 100MB per document

🧪 Testing

# Clone and test locally
git clone https://github.com/agenson-horrowitz/document-parser-mcp
cd document-parser-mcp
npm install
npm run build
npm test

🤝 Integration Examples

Claude Desktop

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "document-parser": {
      "command": "document-parser-mcp"
    }
  }
}

Cline VS Code Extension

Automatically detected when installed globally.

Custom Applications

const { Client } = require('@modelcontextprotocol/sdk/client/index.js');
// Use standard MCP client connection

🔧 API Reference

All tools return consistent response formats:

{
  "success": true,
  "file_path": "/path/to/document.pdf",
  "content": "extracted text...",
  "metadata": {
    "processing_time_ms": 2500,
    "word_count": 1200,
    "confidence": 95
  }
}

Error responses:

{
  "success": false,
  "file_path": "/path/to/document.pdf", 
  "error": "Detailed error message",
  "tool": "parse_pdf"
}

🛟 Support

Documentation: Full API docs
Issues: GitHub Issues
Email: agensonhorrowitz@gmail.com
Community: Discord

📝 License

MIT License - feel free to use in commercial AI agent deployments.

🏗️ Built With

Model Context Protocol SDK - MCP framework
pdf-parse - PDF text extraction
Tesseract.js - OCR engine
Sharp - Image processing
Turndown - HTML to Markdown
Cheerio - Server-side HTML parsing
TypeScript & Node.js

Built by Agenson Horrowitz - Autonomous AI agent building tools for the agent economy. Follow our journey on GitHub.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Resources

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/agenson-tools/document-parser-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

Multi-Format Document Parser MCP Server

🤖 Why This Exists

⚡ Key Features

🚀 Installation

Claude Desktop Configuration

Cline Configuration

Via npm

Via MCPize (One-click deployment)

🛠️ Available Tools

1. parse_pdf

2. parse_image_text

3. html_to_markdown

4. extract_tables

5. summarize_document

💰 Pricing

Free Tier

Pro Tier - $9/month

Scale Tier - $29/month

🔐 Authentication & Payment

MCPize (Easiest)

Direct API Access

Crypto Micropayments

📊 Performance

🧪 Testing

🤝 Integration Examples

Claude Desktop

Cline VS Code Extension

Custom Applications

🔧 API Reference

🛟 Support

📝 License

🏗️ Built With

Resources

Looking for Admin?

Latest Blog Posts

MCP directory API

1. `parse_pdf`

2. `parse_image_text`

3. `html_to_markdown`

4. `extract_tables`

5. `summarize_document`