MD Webcrawl MCP

MD MCP 网络爬虫项目

基于 Python 的 MCP（ https://modelcontextprotocol.io/introduction ）网络爬虫，用于提取和保存网站内容。

特征

提取网站内容并保存为 markdown 文件
地图网站结构和链接
批量处理多个 URL
可配置的输出目录

安装

克隆存储库：

git clone https://github.com/yourusername/webcrawler.git
cd webcrawler

安装依赖项：

pip install -r requirements.txt

可选：配置环境变量：

export OUTPUT_PATH=./output  # Set your preferred output directory

输出

爬取的内容以markdown格式保存在指定的输出目录中。

配置

可以通过环境变量配置服务器：

OUTPUT_PATH ：保存文件的默认输出目录
MAX_CONCURRENT_REQUESTS ：最大并行请求数（默认值：5）
REQUEST_TIMEOUT ：请求超时（秒）（默认值：30）

克劳德的设置

使用 FastMCP 安装fastmcp install server.py

或用户自定义设置直接使用 fastmcp 运行

"Crawl Server": {
      "command": "fastmcp",
      "args": [
        "run",
        "/Users/mm22/Dev_Projekte/servers-main/src/Webcrawler/server.py"
      ],
      "env": {
        "OUTPUT_PATH": "/Users/user/Webcrawl"
      }

发展

实时开发

fastmcp dev server.py --with-editable .

调试

它有助于使用https://modelcontextprotocol.io/docs/tools/inspector进行调试

示例

示例 1：提取并保存内容

mcp call extract_content --url "https://example.com" --output_path "example.md"

示例 2：创建内容索引

mcp call scan_linked_content --url "https://example.com" | \
  mcp call create_index --content_map - --output_path "index.md"

贡献

分叉存储库
创建功能分支（ git checkout -b feature/AmazingFeature ）
提交您的更改（ git commit -m 'Add some AmazingFeature' ）
推送到分支（ git push origin feature/AmazingFeature ）
打开拉取请求

执照

根据 MIT 许可证分发。更多信息请参阅LICENSE 。

要求

Python 3.7+
FastMCP（uv pip 安装 fastmcp）
requirements.txt 中列出的依赖项

This server cannot be installed

security - not tested

license - permissive license

quality - not tested

How are these scores calculated?

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

基于 Python 的 MCP 服务器，可抓取网站以提取内容并将其保存为 markdown 文件，并具有映射网站结构和链接的功能。

Related Resources

Reddit Discussion about this server

Related MCP Servers

MCP NPX Fetch
tokenizin-agency
A
security
A
license
A
quality
A powerful MCP server for fetching and transforming web content into various formats (HTML, JSON, Markdown, Plain Text) with ease.
Last updated -
4
146
12
TypeScript
MIT License
Markdown Downloader
dazeb
A
security
A
license
A
quality
An MCP server that enables users to download webpages as markdown files using r.jina.ai service, with features for configurable download directories and automatic date-stamped filenames.
Last updated -
5
2
25
JavaScript
MIT License
Mozilla Readability Parser MCP Server
jmh108
-
security
A
license
-
quality
A Python implementation of an MCP server that extracts webpage content, removes ads and non-essential elements, and transforms it into clean, LLM-optimized Markdown.
Last updated -
1
Python
MIT License
Website Scraper MCP Server
tolik-unicornrider
-
security
F
license
-
quality
An MCP server that extracts meaningful content from websites and converts HTML to high-quality Markdown, using Mozilla's Readability engine.
Last updated -
11,993
2
JavaScript

View all related MCP servers

MD Webcrawl MCP

MD MCP 网络爬虫项目

特征

安装

输出

配置

克劳德的设置

发展

实时开发

调试

示例

示例 1：提取并保存内容

示例 2：创建内容索引

贡献

执照

要求

Related Resources

Related MCP Servers

MCP NPX Fetch

Markdown Downloader

Mozilla Readability Parser MCP Server

Website Scraper MCP Server

Appeared in Searches

New MCP Servers

MCP directory API