Integrations
MD MCP Webcrawler Project
A Python-based MCP (https://modelcontextprotocol.io/introduction) web crawler for extracting and saving website content.
Features
- Extract website content and save as markdown files
- Map website structure and links
- Batch processing of multiple URLs
- Configurable output directory
Installation
- Clone the repository:
- Install dependencies:
- Optional: Configure environment variables:
Output
Crawled content is saved in markdown format in the specified output directory.
Configuration
The server can be configured through environment variables:
OUTPUT_PATH
: Default output directory for saved filesMAX_CONCURRENT_REQUESTS
: Maximum parallel requests (default: 5)REQUEST_TIMEOUT
: Request timeout in seconds (default: 30)
Claude Set-Up
Install with FastMCP
fastmcp install server.py
or user custom settings to run with fastmcp directly
Development
Live Development
Debug
It helps to use https://modelcontextprotocol.io/docs/tools/inspector for debugging
Examples
Example 1: Extract and Save Content
Example 2: Create Content Index
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE
for more information.
Requirements
- Python 3.7+
- FastMCP (uv pip install fastmcp)
- Dependencies listed in requirements.txt
This server cannot be installed
local-only server
The server can only run on the client's local machine because it depends on local resources.
A Python-based MCP server that crawls websites to extract and save content as markdown files, with features for mapping website structure and links.
- Features
- Installation
- Output
- Configuration
- Claude Set-Up
- Development
- Examples
- Contributing
- License
- Requirements
Related Resources
Related MCP Servers
- AsecurityAlicenseAqualityA powerful MCP server for fetching and transforming web content into various formats (HTML, JSON, Markdown, Plain Text) with ease.Last updated -414612TypeScriptMIT License
- AsecurityAlicenseAqualityAn MCP server that enables users to download webpages as markdown files using r.jina.ai service, with features for configurable download directories and automatic date-stamped filenames.Last updated -51123JavaScriptMIT License
- -securityAlicense-qualityA Python implementation of an MCP server that extracts webpage content, removes ads and non-essential elements, and transforms it into clean, LLM-optimized Markdown.Last updated -1PythonMIT License
- -securityFlicense-qualityAn MCP server that extracts meaningful content from websites and converts HTML to high-quality Markdown, using Mozilla's Readability engine.Last updated -11,9932JavaScript