ドキュメントスクレーパー MCP サーバー

ドキュメントスクレイピング機能を提供するModel Context Protocol（MCP）サーバー。このサーバーは、jina.aiの変換サービスを使用して、WebベースのドキュメントをMarkdown形式に変換します。

特徴

任意のWeb URLからドキュメントをスクレイピングします
HTMLドキュメントをマークダウン形式に変換します
変換されたドキュメントを指定された出力パスに保存します
モデルコンテキストプロトコル（MCP）と統合

インストール

Smithery経由でインストール

Smithery経由で Claude Desktop 用の Doc Scraper を自動的にインストールするには:

npx -y @smithery/cli install @askjohngeorge/mcp-doc-scraper --client claude

リポジトリをクローンします。

git clone https://github.com/askjohngeorge/mcp-doc-scraper.git
cd mcp-doc-scraper

仮想環境を作成してアクティブ化します。

python -m venv venv
source venv/bin/activate  # On Windows, use: venv\Scripts\activate

依存関係をインストールします。

pip install -e .

使用法

サーバーは Python を使用して実行できます。

python -m mcp_doc_scraper

ツールの説明

サーバーは次の単一のツールを提供します:

名前: scrape_docs
説明: URLからドキュメントをスクレイピングし、マークダウンとして保存します
入力パラメータ:
- url : スクレイピングするドキュメントのURL
- output_path : マークダウンファイルを保存するパス

プロジェクト構造

doc_scraper/
├── __init__.py
├── __main__.py
└── server.py

依存関係

aiohttp
マクピー
ピダンティック

発達

開発環境をセットアップするには:

開発依存関係をインストールします。

pip install -r requirements.txt

サーバーはモデルコンテキストプロトコル（ MCP）を使用します。MCPのドキュメントをよくお読みください。

ライセンス

MITライセンス

This server cannot be installed

security - not tested

license - not found

quality - not tested

How are these scores calculated?

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

jina.ai の変換サービスを使用して Web ベースのドキュメントをマークダウン形式に変換し、ユーザーが任意の URL からドキュメントをスクレイピングしてマークダウンファイルとして保存できるようにします。

Related Resources

Reddit Discussion about this server

Related MCP Servers

Markdownify MCP Server
zcaceres
A
security
A
license
A
quality
Converts various file types and web content to Markdown format. It provides a set of tools to transform PDFs, images, audio files, web pages, and more into easily readable and shareable Markdown text.
Last updated -
10
11
1,966
TypeScript
MIT License
Skrape MCP Serverofficial
skrapeai
A
security
A
license
A
quality
This server converts webpages into clean, structured Markdown optimized for language model consumption, removing unnecessary content and supporting JavaScript rendering.
Last updated -
1
10
JavaScript
MIT License
Markdownify MCP Server - UTF-8 Enhanced
JDJR2024
A
security
A
license
A
quality
A document conversion server that transforms various file formats (PDFs, documents, images, audio, web content) to Markdown with improved multilingual and UTF-8 support.
Last updated -
10
2
9
TypeScript
MIT License
MCP Markdown Conversion Server
FradSer
A
security
F
license
A
quality
A server that converts various file formats (PDF, images, Office documents, etc.) to Markdown descriptions using Cloudflare AI services.
Last updated -
1
11
33
JavaScript

View all related MCP servers

Doc Scraper MCP Server