MCP Website Downloader
Simple MCP server for downloading documentation websites and preparing them for RAG indexing.
Features
- Downloads complete documentation sites, well big chunks anyway.
- Maintains link structure and navigation, not really. lol
- Downloads and organizes assets (CSS, JS, images), but isn't really AI friendly and it all probably needs some kind of parsing or vectorizing into a db or something.
- Creates clean index for RAG systems, currently seems to make an index in each folder, not even looked at it.
- Simple single-purpose MCP interface, yup.
Installation
Fork and download, cd to the repository.
Put this in your claude_desktop_config.json with your own paths:
Other Usage you don't need to worry about and may be hallucinatory lol:
- Start the server:
- Use through Claude Desktop or other MCP clients:
Output Structure
Development
The server follows standard MCP architecture:
Components
server.py
: Main MCP server implementation that handles tool registration and requestscore.py
: Core website downloading functionality with proper asset handlingutils.py
: Helper utilities for file handling and URL processing
Design Principles
- Single Responsibility
- Each module has one clear purpose
- Server handles MCP interface
- Core handles downloading
- Utils handles common operations
- Clean Structure
- Maintains original site structure
- Organizes assets by type
- Creates clear index for RAG systems
- Robust Operation
- Proper error handling
- Reasonable depth limits
- Asset download verification
- Clean URL/path processing
RAG Index
The rag_index.json
file contains:
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
License
MIT License - See LICENSE file
Error Handling
The server handles common issues:
- Invalid URLs
- Network errors
- Asset download failures
- Malformed HTML
- Deep recursion
- File system errors
Error responses follow the format:
Success responses:
local-only server
The server can only run on the client's local machine because it depends on local resources.
Tools
该服务器使用户能够下载整个网站及其资产以供离线访问,支持可配置的深度和并发设置。
Related Resources
Related MCP Servers
- AsecurityAlicenseAqualityThis server enables LLMs to retrieve and process content from web pages, converting HTML to markdown for easier consumption.Last updated -166,803MIT License
- AsecurityFlicenseAqualityProvides a tool to download entire websites using wget. It preserves the website structure and converts links to work locally.Last updated -1133
- AsecurityAlicenseAqualityAn advanced web browsing server enabling headless browser interactions via a secure API, providing features like navigation, content extraction, element interaction, and screenshot capture.Last updated -621MIT License
- -securityFlicense-qualityThis server provides an interface for performing basic file system operations such as navigation, reading, writing, and file analysis, allowing users to manage directories and files efficiently.Last updated -4