Skip to main content
Glama
angrysky56

MCP Windows Website Downloader Server

MCP Website Downloader

Simple MCP server for downloading documentation websites and preparing them for RAG indexing.

Features

  • Downloads complete documentation sites, well big chunks anyway.

  • Maintains link structure and navigation, not really. lol

  • Downloads and organizes assets (CSS, JS, images), but isn't really AI friendly and it all probably needs some kind of parsing or vectorizing into a db or something.

  • Creates clean index for RAG systems, currently seems to make an index in each folder, not even looked at it.

  • Simple single-purpose MCP interface, yup.

Related MCP server: Website Downloader

Installation

Fork and download, cd to the repository.

uv venv
./venv/Scripts/activate
pip install -e .

Put this in your claude_desktop_config.json with your own paths:

   "mcp-windows-website-downloader": {
     "command": "uv",
     "args": [
       "--directory",
       "F:/GithubRepos/mcp-windows-website-downloader",
       "run",
       "mcp-windows-website-downloader",
       "--library",
       "F:/GithubRepos/mcp-windows-website-downloader/website_library"
     ]
   },

alt text

Other Usage you don't need to worry about and may be hallucinatory lol:

  1. Start the server:

python -m mcp_windows_website_downloader.server --library docs_library
  1. Use through Claude Desktop or other MCP clients:

result = await server.call_tool("download", {
    "url": "https://docs.example.com"
})

Output Structure

docs_library/
  domain_name/
    index.html
    about.html
    docs/
      getting-started.html
      ...
    assets/
      css/
      js/
      images/
      fonts/
    rag_index.json

Development

The server follows standard MCP architecture:

src/
  mcp_windows_website_downloader/
    __init__.py
    server.py    # MCP server implementation
    core.py      # Core downloader functionality
    utils.py     # Helper utilities

Components

  • server.py: Main MCP server implementation that handles tool registration and requests

  • core.py: Core website downloading functionality with proper asset handling

  • utils.py: Helper utilities for file handling and URL processing

Design Principles

  1. Single Responsibility

    • Each module has one clear purpose

    • Server handles MCP interface

    • Core handles downloading

    • Utils handles common operations

  2. Clean Structure

    • Maintains original site structure

    • Organizes assets by type

    • Creates clear index for RAG systems

  3. Robust Operation

    • Proper error handling

    • Reasonable depth limits

    • Asset download verification

    • Clean URL/path processing

RAG Index

The rag_index.json file contains:

{
  "url": "https://docs.example.com",
  "domain": "docs.example.com", 
  "pages": 42,
  "path": "/path/to/site"
}

Contributing

  1. Fork the repository

  2. Create a feature branch

  3. Make your changes

  4. Submit a pull request

License

MIT License - See LICENSE file

Error Handling

The server handles common issues:

  • Invalid URLs

  • Network errors

  • Asset download failures

  • Malformed HTML

  • Deep recursion

  • File system errors

Error responses follow the format:

{
  "status": "error",
  "error": "Detailed error message"
}

Success responses:

{
  "status": "success",
  "path": "/path/to/downloaded/site",
  "pages": 42
}
Install Server
A
security – no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/angrysky56/mcp-windows-website-downloader'

If you have feedback or need assistance with the MCP directory API, please join our Discord server