MD Webcrawl MCP
local-only server
The server can only run on the client’s local machine because it depends on local resources.
Integrations
MD MCP Webcrawler Project
A Python-based MCP (https://modelcontextprotocol.io/introduction) web crawler for extracting and saving website content.
Features
- Extract website content and save as markdown files
- Map website structure and links
- Batch processing of multiple URLs
- Configurable output directory
Installation
- Clone the repository:
- Install dependencies:
- Optional: Configure environment variables:
Output
Crawled content is saved in markdown format in the specified output directory.
Configuration
The server can be configured through environment variables:
OUTPUT_PATH
: Default output directory for saved filesMAX_CONCURRENT_REQUESTS
: Maximum parallel requests (default: 5)REQUEST_TIMEOUT
: Request timeout in seconds (default: 30)
Claude Set-Up
Install with FastMCP
fastmcp install server.py
or user custom settings to run with fastmcp directly
Development
Live Development
Debug
It helps to use https://modelcontextprotocol.io/docs/tools/inspector for debugging
Examples
Example 1: Extract and Save Content
Example 2: Create Content Index
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE
for more information.
Requirements
- Python 3.7+
- FastMCP (uv pip install fastmcp)
- Dependencies listed in requirements.txt
This server cannot be installed
A Python-based MCP server that crawls websites to extract and save content as markdown files, with features for mapping website structure and links.
- Features
- Installation
- Output
- Configuration
- Claude Set-Up
- Development
- Examples
- Contributing
- License
- Requirements