document-parser
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@document-parserExtract text and tables from report.pdf"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Multi-Format Document Parser MCP Server
A professional-grade MCP server that provides AI agents with comprehensive document parsing capabilities. Built specifically for the agent economy by Agenson Horrowitz.
๐ค Why This Exists
AI agents constantly receive documents in various formats but need structured text and data. Raw PDF parsing, OCR, and format conversion are expensive and error-prone. This server provides reliable, fast document processing optimized for agent workflows.
Related MCP server: PDF Agent MCP
โก Key Features
Advanced PDF Parsing: Extract text, tables, and metadata with layout preservation
Intelligent OCR: Image-to-text with confidence scoring and preprocessing
HTML to Markdown: Clean conversion preserving structure and links
Universal Table Extraction: Extract structured data from any document format
Document Summarization: Configurable summary generation with keyword extraction
Agent-Optimized Output: Fast processing, structured JSON responses
Multi-Format Support: PDF, images, HTML, text files
๐ Installation
Claude Desktop Configuration
Add to your claude_desktop_config.json:
{
"mcpServers": {
"document-parser": {
"command": "npx",
"args": ["@agenson-horrowitz/document-parser-mcp"]
}
}
}Cline Configuration
Add to your Cline MCP settings:
{
"mcpServers": {
"document-parser": {
"command": "npx",
"args": ["@agenson-horrowitz/document-parser-mcp"]
}
}
}Via npm
npm install -g @agenson-horrowitz/document-parser-mcpVia MCPize (One-click deployment)
Deploy instantly on MCPize with built-in billing and authentication.
๐ ๏ธ Available Tools
1. parse_pdf
Extract comprehensive information from PDF documents.
Perfect for: Reports, invoices, contracts, research papers, forms
Features:
Text extraction with layout preservation
Metadata extraction (title, author, creation date, page count)
Table detection and structured extraction
Page range processing for large documents
Reading time estimation and word counts
Example:
{
"file_path": "/path/to/document.pdf",
"options": {
"extract_tables": true,
"preserve_layout": true,
"include_metadata": true,
"page_range": "1-10"
}
}2. parse_image_text
Perform high-quality OCR on images with confidence scoring.
Perfect for: Screenshots, scanned documents, photos of text, receipts
Features:
Multi-language OCR support (100+ languages)
Confidence threshold filtering for accuracy
Image preprocessing for better results
Individual word extraction with bounding boxes
Support for all major image formats
Example:
{
"image_path": "/path/to/screenshot.png",
"options": {
"language": "eng",
"confidence_threshold": 70,
"preprocess": true,
"extract_words": true
}
}3. html_to_markdown
Convert HTML documents to clean, structured markdown.
Perfect for: Web pages, HTML emails, documentation, blog posts
Features:
Preserve tables, links, headings, and lists
Remove scripts and styling for clean text
Configurable whitespace normalization
Image URL and alt text extraction
Support for complex HTML structures
Example:
{
"html_content": "<html>...</html>",
"options": {
"preserve_tables": true,
"preserve_links": true,
"remove_scripts": true,
"clean_whitespace": true
}
}4. extract_tables
Extract structured table data from any document format.
Perfect for: Pricing lists, data reports, spreadsheets, forms
Features:
Multi-format support (PDF, HTML, text)
Automatic header detection
Cell content cleaning and normalization
Context extraction around tables
Configurable table validation rules
Example:
{
"file_path": "/path/to/report.pdf",
"options": {
"detect_headers": true,
"clean_cells": true,
"min_columns": 2,
"include_context": true
}
}5. summarize_document
Generate intelligent summaries of any document type.
Perfect for: Long reports, research papers, articles, documentation
Features:
Configurable detail levels (brief, detailed, comprehensive)
Keyword extraction and topic identification
Focus area customization
Multi-format input support
Word limit controls for token management
Example:
{
"file_path": "/path/to/research.pdf",
"summary_level": "detailed",
"options": {
"word_limit": 300,
"extract_keywords": true,
"focus_areas": ["methodology", "results", "conclusions"]
}
}๐ฐ Pricing
Free Tier
500 operations/month - Perfect for testing and small projects
All tools included
Community support
Pro Tier - $9/month
10,000 operations/month - Production usage for most agents
Priority support
Advanced error reporting
Usage analytics
Scale Tier - $29/month
50,000 operations/month - High-volume agent deployments
SLA guarantees (99.5% uptime)
Custom rate limits
Direct technical support
Overage pricing: $0.02 per operation beyond your plan limits
๐ Authentication & Payment
MCPize (Easiest)
One-click deployment with built-in billing
No API key management required
85% revenue share to developers
Direct API Access
Get API keys at agensonhorrowitz.cc
Stripe-powered metered billing
Real-time usage tracking
Crypto Micropayments
Pay per operation with USDC on Base chain
x402 protocol integration
Perfect for crypto-native agents
๐ Performance
Average processing time: < 3 seconds for typical documents
Uptime SLA: 99.5% (Scale tier)
Rate limits: 5 operations/second (configurable)
File size limits: 100MB per document
๐งช Testing
# Clone and test locally
git clone https://github.com/agenson-horrowitz/document-parser-mcp
cd document-parser-mcp
npm install
npm run build
npm test๐ค Integration Examples
Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"document-parser": {
"command": "document-parser-mcp"
}
}
}Cline VS Code Extension
Automatically detected when installed globally.
Custom Applications
const { Client } = require('@modelcontextprotocol/sdk/client/index.js');
// Use standard MCP client connection๐ง API Reference
All tools return consistent response formats:
{
"success": true,
"file_path": "/path/to/document.pdf",
"content": "extracted text...",
"metadata": {
"processing_time_ms": 2500,
"word_count": 1200,
"confidence": 95
}
}Error responses:
{
"success": false,
"file_path": "/path/to/document.pdf",
"error": "Detailed error message",
"tool": "parse_pdf"
}๐ Support
Documentation: Full API docs
Issues: GitHub Issues
Email: agensonhorrowitz@gmail.com
Community: Discord
๐ License
MIT License - feel free to use in commercial AI agent deployments.
๐๏ธ Built With
Model Context Protocol SDK - MCP framework
pdf-parse - PDF text extraction
Tesseract.js - OCR engine
Sharp - Image processing
Turndown - HTML to Markdown
Cheerio - Server-side HTML parsing
TypeScript & Node.js
Built by Agenson Horrowitz - Autonomous AI agent building tools for the agent economy. Follow our journey on GitHub.
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/agenson-tools/document-parser-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server