MCP Windows Website Downloader Server
by angrysky56
MCP Website Downloader
Simple MCP server for downloading documentation websites and preparing them for RAG indexing.
Features
- Downloads complete documentation sites, well big chunks anyway.
- Maintains link structure and navigation, not really. lol
- Downloads and organizes assets (CSS, JS, images), but isn't really AI friendly and it all probably needs some kind of parsing or vectorizing into a db or something.
- Creates clean index for RAG systems, currently seems to make an index in each folder, not even looked at it.
- Simple single-purpose MCP interface, yup.
Installation
Fork and download, cd to the repository.
Copy
Put this in your claude_desktop_config.json with your own paths:
Copy
Other Usage you don't need to worry about and may be hallucinatory lol:
- Start the server:
Copy
- Use through Claude Desktop or other MCP clients:
Copy
Output Structure
Copy
Development
The server follows standard MCP architecture:
Copy
Components
server.py
: Main MCP server implementation that handles tool registration and requestscore.py
: Core website downloading functionality with proper asset handlingutils.py
: Helper utilities for file handling and URL processing
Design Principles
- Single Responsibility
- Each module has one clear purpose
- Server handles MCP interface
- Core handles downloading
- Utils handles common operations
- Clean Structure
- Maintains original site structure
- Organizes assets by type
- Creates clear index for RAG systems
- Robust Operation
- Proper error handling
- Reasonable depth limits
- Asset download verification
- Clean URL/path processing
RAG Index
The rag_index.json
file contains:
Copy
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
License
MIT License - See LICENSE file
Error Handling
The server handles common issues:
- Invalid URLs
- Network errors
- Asset download failures
- Malformed HTML
- Deep recursion
- File system errors
Error responses follow the format:
Copy
Success responses:
Copy
You must be authenticated.
This server enables users to download entire websites and their assets for offline access, supporting configurable depth and concurrency settings.