MCP PDF サーバー

📄 MCP PDF サーバー

FastMCPをベースにした PDF ファイル読み取りサーバー。

PDF テキスト抽出、OCR 認識、MCP プロトコル経由の画像抽出をサポートし、テストを簡単に行うための Web デバッガーが組み込まれています。

🚀 機能

PDFテキストを読む
PDF から通常のテキストを抽出します (ページごとに)。
OCRによる読み取り
OCR を使用して、スキャンされた PDF または画像ベースの PDF からテキストを認識します。
PDF画像を読む
指定された PDF ページからすべての画像を抽出します (Base64 エンコードされた出力)。

📂 プロジェクト構造

mcp-pdf-server/
├── pdf_resources/        # Directory for uploaded and processed PDF files
├── txt_server.py         # Main server entry point
└── README.md             # Project documentation

⚙️ インストール

推奨される Python バージョン: 3.9 以上

pip install pymupdf mcp

注意: OCR 機能を使用するには、OCR サポートまたは外部 OCR ライブラリを備えた MuPDF ビルドが必要になる場合があります。

🔦 サーバーを起動する

次のコマンドを実行します。

python txt_server.py

次のようなログが表示されます。

Serving on http://127.0.0.1:6231

🌐 Webデバッグインターフェース

ブラウザを開いて次のサイトにアクセスしてください:

http://127.0.0.1:6231

左パネルからツールを選択します
右側のパネルにパラメータを入力します
「実行」をクリックしてツールをテストします

コーディングは不要で、Web UI 経由で簡単にデバッグおよびテストできます。

🛠️ APIツールリスト

道具	説明	入力パラメータ	返品
`read_pdf_text`	PDFページから通常のテキストを抽出します	`file_path` 、 `start_page` 、 `end_page`	ページテキストのリスト
`read_by_ocr`	OCRでテキストを認識する	`file_path` 、 `start_page` 、 `end_page` 、 `language` 、 `dpi`	OCRで抽出したテキスト
`read_pdf_images`	PDFページから画像を抽出します	`file_path` 、 `page_number`	画像リスト（Base64エンコード）

📝 使用例

1 ページから 5 ページまでのテキストを抽出します。

mcp run read_pdf_text --args '{"file_path": "pdf_resources/example.pdf", "start_page": 1, "end_page": 5}'

1ページでOCR認識を実行します。

mcp run read_by_ocr --args '{"file_path": "pdf_resources/example.pdf", "start_page": 1, "end_page": 1, "language": "eng"}'

ページ 3 からすべての画像を抽出します。

mcp run read_pdf_images --args '{"file_path": "pdf_resources/example.pdf", "page_number": 3}'

📢 注意事項

ファイルはpdf_resources/ディレクトリ内に配置するか、絶対パスを指定する必要があります。
OCR 機能を使用するには、環境内で適切な OCR サポートが必要です。
大きなファイルを処理する場合は、必要に応じてメモリとタイムアウトの設定を調整します。

📜 ライセンス

このプロジェクトは MIT ライセンスに基づいてライセンスされています。
商用利用の場合は、元の出典を明記してください。

This server cannot be installed

security - not tested

license - not found

quality - not tested

How are these scores calculated?

local-only server

The server can only run on the client's local machine because it depends on local resources.

通常の解析または OCR を使用してテキストを抽出し、Web デバッガーが組み込まれた MCP プロトコルを介して PDF ファイルから画像を取得する PDF 処理サーバー。

Related MCP Servers

PDF Extraction MCP Server
xraywu
A
security
F
license
A
quality
An MCP server that provides a tool to extract text content from local PDF files, supporting both standard PDF reading and OCR capabilities with optional page selection.
Last updated -
1
17
Python
Textin MCP Serverofficial
intsig-textin
A
security
A
license
A
quality
A server that enables OCR capabilities to recognize text from images, PDFs, and Word documents, convert them to Markdown, and extract key information.
Last updated -
3
19
18
JavaScript
MIT License
MCP PDF Server
Dev-91
-
security
A
license
-
quality
A Model Context Protocol (MCP) based server that efficiently manages PDF files, allowing AI coding tools like Cursor to read, summarize, and extract information from PDF datasheets to assist embedded development work.
Last updated -
6
Apache 2.0
PDF Reader MCP Server
averagejoeslab
-
security
F
license
-
quality
An MCP server that provides comprehensive PDF processing capabilities including text extraction, image extraction, table detection, annotation extraction, metadata retrieval, page rendering, and document structure analysis.
Last updated -
Python

View all related MCP servers

Appeared in Searches

How to extract text from images

MCP PDF Server