The MCP Windows Website Downloader Server is a tool for downloading and organizing documentation websites for RAG indexing. With this server, you can:
Download complete documentation sites while maintaining their original link structure and navigation
Organize downloaded content (HTML pages) and assets (CSS, JS, images, fonts) into appropriate folders
Generate a
rag_index.json
file containing metadata about the downloaded site for RAG systemsPreserve website structure in an organized folder hierarchy
Handle errors like invalid URLs, network issues, and asset download failures with structured responses
Integrate with Claude Desktop and other MCP clients through a standard interface
MCP Website Downloader
Simple MCP server for downloading documentation websites and preparing them for RAG indexing.
Features
Downloads complete documentation sites, well big chunks anyway.
Maintains link structure and navigation, not really. lol
Downloads and organizes assets (CSS, JS, images), but isn't really AI friendly and it all probably needs some kind of parsing or vectorizing into a db or something.
Creates clean index for RAG systems, currently seems to make an index in each folder, not even looked at it.
Simple single-purpose MCP interface, yup.
Installation
Fork and download, cd to the repository.
Put this in your claude_desktop_config.json with your own paths:
Other Usage you don't need to worry about and may be hallucinatory lol:
Start the server:
Use through Claude Desktop or other MCP clients:
Output Structure
Development
The server follows standard MCP architecture:
Components
server.py
: Main MCP server implementation that handles tool registration and requestscore.py
: Core website downloading functionality with proper asset handlingutils.py
: Helper utilities for file handling and URL processing
Design Principles
Single Responsibility
Each module has one clear purpose
Server handles MCP interface
Core handles downloading
Utils handles common operations
Clean Structure
Maintains original site structure
Organizes assets by type
Creates clear index for RAG systems
Robust Operation
Proper error handling
Reasonable depth limits
Asset download verification
Clean URL/path processing
RAG Index
The rag_index.json
file contains:
Contributing
Fork the repository
Create a feature branch
Make your changes
Submit a pull request
License
MIT License - See LICENSE file
Error Handling
The server handles common issues:
Invalid URLs
Network errors
Asset download failures
Malformed HTML
Deep recursion
File system errors
Error responses follow the format:
Success responses:
local-only server
The server can only run on the client's local machine because it depends on local resources.
Tools
This server enables users to download entire websites and their assets for offline access, supporting configurable depth and concurrency settings.
- Features
- Installation
- Other Usage you don't need to worry about and may be hallucinatory lol:
- Output Structure
- Development
- Contributing
- License
- Error Handling
Related Resources
Related MCP Servers
- AsecurityAlicenseAqualityThis server enables LLMs to retrieve and process content from web pages, converting HTML to markdown for easier consumption.Last updated -169,440MIT License
- AsecurityFlicenseAqualityProvides a tool to download entire websites using wget. It preserves the website structure and converts links to work locally.Last updated -1138
- AsecurityAlicenseAqualityAn advanced web browsing server enabling headless browser interactions via a secure API, providing features like navigation, content extraction, element interaction, and screenshot capture.Last updated -623MIT License
- -securityFlicense-qualityThis server provides an interface for performing basic file system operations such as navigation, reading, writing, and file analysis, allowing users to manage directories and files efficiently.Last updated -4