MD ウェブクロール MCP

MD MCP ウェブクローラープロジェクト

ウェブサイトのコンテンツを抽出して保存するための Python ベースの MCP ( https://modelcontextprotocol.io/introduction ) ウェブクローラー。

特徴

ウェブサイトのコンテンツを抽出し、マークダウンファイルとして保存する
ウェブサイトの構造とリンクをマップする
複数のURLのバッチ処理
設定可能な出力ディレクトリ

インストール

リポジトリをクローンします。

git clone https://github.com/yourusername/webcrawler.git
cd webcrawler

依存関係をインストールします:

pip install -r requirements.txt

オプション: 環境変数を設定します。

export OUTPUT_PATH=./output  # Set your preferred output directory

出力

クロールされたコンテンツは、指定された出力ディレクトリにマークダウン形式で保存されます。

構成

サーバーは環境変数を通じて設定できます:

OUTPUT_PATH : 保存されたファイルのデフォルトの出力ディレクトリ
MAX_CONCURRENT_REQUESTS : 最大並列リクエスト数（デフォルト: 5）
REQUEST_TIMEOUT : リクエストのタイムアウト（秒）（デフォルト: 30）

クロード・セットアップ

FastMCPでインストールするfastmcp install server.py

または、fastmcp で直接実行するためのユーザーカスタム設定

"Crawl Server": {
      "command": "fastmcp",
      "args": [
        "run",
        "/Users/mm22/Dev_Projekte/servers-main/src/Webcrawler/server.py"
      ],
      "env": {
        "OUTPUT_PATH": "/Users/user/Webcrawl"
      }

発達

ライブ開発

fastmcp dev server.py --with-editable .

デバッグ

デバッグにはhttps://modelcontextprotocol.io/docs/tools/inspectorを使用すると便利です。

例

例1: コンテンツの抽出と保存

mcp call extract_content --url "https://example.com" --output_path "example.md"

例2: コンテンツインデックスを作成する

mcp call scan_linked_content --url "https://example.com" | \
  mcp call create_index --content_map - --output_path "index.md"

貢献

リポジトリをフォークする
機能ブランチを作成する ( git checkout -b feature/AmazingFeature )
変更をコミットします（ git commit -m 'Add some AmazingFeature' ）
ブランチにプッシュする ( git push origin feature/AmazingFeature )
プルリクエストを開く

ライセンス

MITライセンスに基づいて配布されています。詳細についてはLICENSEをご覧ください。

要件

Python 3.7以上
FastMCP (uv pip install fastmcp)
requirements.txt にリストされている依存関係

This server cannot be installed

security - not tested

license - permissive license

quality - not tested

How are these scores calculated?

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

ウェブサイトをクロールしてコンテンツを抽出し、マークダウンファイルとして保存する Python ベースの MCP サーバー。ウェブサイトの構造とリンクをマッピングする機能も備えています。

Related Resources

Reddit Discussion about this server

Related MCP Servers

MCP NPX Fetch
tokenizin-agency
A
security
A
license
A
quality
A powerful MCP server for fetching and transforming web content into various formats (HTML, JSON, Markdown, Plain Text) with ease.
Last updated -
4
1,285
32
TypeScript
MIT License
UseScraper MCP Server
tanevanwifferen
A
security
A
license
A
quality
A TypeScript-based MCP server utilizing the UseScraper API to provide web scraping capabilities, allowing users to extract content from webpages in various formats.
Last updated -
1
2
JavaScript
MIT License
Mozilla Readability Parser MCP Server
jmh108
-
security
A
license
-
quality
A Python implementation of an MCP server that extracts webpage content, removes ads and non-essential elements, and transforms it into clean, LLM-optimized Markdown.
Last updated -
2
Python
MIT License
pure.md MCP serverofficial
puremd
A
security
F
license
A
quality
An MCP server that enables AI clients like Cursor, Windsurf, and Claude Desktop to access web content in markdown format, providing web unblocking and searching capabilities.
Last updated -
2
583
30
JavaScript

View all related MCP servers

MD Webcrawl MCP

MD MCP ウェブクローラープロジェクト

特徴

インストール

出力

構成

クロード・セットアップ

発達

ライブ開発

デバッグ

例

例1: コンテンツの抽出と保存

例2: コンテンツインデックスを作成する

貢献

ライセンス

要件

Related Resources

Related MCP Servers

MCP NPX Fetch

UseScraper MCP Server

Mozilla Readability Parser MCP Server

pure.md MCP serverofficial

Appeared in Searches

New MCP Servers

MCP directory API

MD MCP ウェブクローラー プロジェクト

特徴

インストール

出力

構成

クロード・セットアップ

発達

ライブ開発

デバッグ

例

例1: コンテンツの抽出と保存

例2: コンテンツインデックスを作成する

貢献

ライセンス

要件

Related Resources

Related MCP Servers

MCP NPX Fetch

UseScraper MCP Server

Mozilla Readability Parser MCP Server

pure.md MCP serverofficial

Appeared in Searches

New MCP Servers

MCP directory API

MD MCP ウェブクローラープロジェクト