MD MCP Webcrawler Project
A Python-based MCP (https://modelcontextprotocol.io/introduction) web crawler for extracting and saving website content.
Features
- Extract website content and save as markdown files
- Map website structure and links
- Batch processing of multiple URLs
- Configurable output directory
Installation
- Clone the repository:
- Install dependencies:
- Optional: Configure environment variables:
Output
Crawled content is saved in markdown format in the specified output directory.
Configuration
The server can be configured through environment variables:
OUTPUT_PATH
: Default output directory for saved filesMAX_CONCURRENT_REQUESTS
: Maximum parallel requests (default: 5)REQUEST_TIMEOUT
: Request timeout in seconds (default: 30)
Claude Set-Up
Install with FastMCP
fastmcp install server.py
or user custom settings to run with fastmcp directly
Development
Live Development
Debug
It helps to use https://modelcontextprotocol.io/docs/tools/inspector for debugging
Examples
Example 1: Extract and Save Content
Example 2: Create Content Index
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
License
Distributed under the MIT License. See LICENSE
for more information.
Requirements
- Python 3.7+
- FastMCP (uv pip install fastmcp)
- Dependencies listed in requirements.txt
This server cannot be installed
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
웹사이트를 크롤링하여 콘텐츠를 추출하고 마크다운 파일로 저장하는 Python 기반 MCP 서버로, 웹사이트 구조와 링크를 매핑하는 기능이 포함되어 있습니다.
Related Resources
Related MCP Servers
- AsecurityAlicenseAqualityA powerful MCP server for fetching and transforming web content into various formats (HTML, JSON, Markdown, Plain Text) with ease.Last updated -41,14536MIT License
- AsecurityAlicenseAqualityA TypeScript-based MCP server utilizing the UseScraper API to provide web scraping capabilities, allowing users to extract content from webpages in various formats.Last updated -2MIT License
- -securityAlicense-qualityA Python implementation of an MCP server that extracts webpage content, removes ads and non-essential elements, and transforms it into clean, LLM-optimized Markdown.Last updated -2MIT License
- -securityAlicense-qualityToolset that crawls websites, generates Markdown documentation, and makes that documentation searchable via a Model Context Protocol (MCP) server for integration with tools like Cursor.Last updated -24MIT License