MD MCP 网络爬虫项目

基于 Python 的 MCP（ https://modelcontextprotocol.io/introduction ）网络爬虫，用于提取和保存网站内容。

特征

提取网站内容并保存为 markdown 文件
地图网站结构和链接
批量处理多个 URL
可配置的输出目录

Related MCP server: UseScraper MCP Server

安装

克隆存储库：

git clone https://github.com/yourusername/webcrawler.git
cd webcrawler

安装依赖项：

pip install -r requirements.txt

可选：配置环境变量：

export OUTPUT_PATH=./output  # Set your preferred output directory

输出

爬取的内容以markdown格式保存在指定的输出目录中。

配置

可以通过环境变量配置服务器：

OUTPUT_PATH ：保存文件的默认输出目录
MAX_CONCURRENT_REQUESTS ：最大并行请求数（默认值：5）
REQUEST_TIMEOUT ：请求超时（秒）（默认值：30）

克劳德的设置

使用 FastMCP 安装fastmcp install server.py

或用户自定义设置直接使用 fastmcp 运行

"Crawl Server": {
      "command": "fastmcp",
      "args": [
        "run",
        "/Users/mm22/Dev_Projekte/servers-main/src/Webcrawler/server.py"
      ],
      "env": {
        "OUTPUT_PATH": "/Users/user/Webcrawl"
      }

发展

实时开发

fastmcp dev server.py --with-editable .

调试

它有助于使用https://modelcontextprotocol.io/docs/tools/inspector进行调试

示例

示例 1：提取并保存内容

mcp call extract_content --url "https://example.com" --output_path "example.md"

示例 2：创建内容索引

mcp call scan_linked_content --url "https://example.com" | \
  mcp call create_index --content_map - --output_path "index.md"

贡献

分叉存储库
创建功能分支（ git checkout -b feature/AmazingFeature ）
提交您的更改（ git commit -m 'Add some AmazingFeature' ）
推送到分支（ git push origin feature/AmazingFeature ）
打开拉取请求

执照

根据 MIT 许可证分发。更多信息请参阅LICENSE 。

要求

Python 3.7+
FastMCP（uv pip 安装 fastmcp）
requirements.txt 中列出的依赖项

This server cannot be installed

A

license - permissive license

-

quality - not tested

C

maintenance

How are these scores calculated?

Resources

GitHub Repository

Need Help?

Reddit Discussion

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

MD Webcrawl MCP