Skip to main content
Glama

MCP Windows Website Downloader Server

MCP Website Downloader

Simple MCP server for downloading documentation websites and preparing them for RAG indexing.

Features

  • Downloads complete documentation sites, well big chunks anyway.
  • Maintains link structure and navigation, not really. lol
  • Downloads and organizes assets (CSS, JS, images), but isn't really AI friendly and it all probably needs some kind of parsing or vectorizing into a db or something.
  • Creates clean index for RAG systems, currently seems to make an index in each folder, not even looked at it.
  • Simple single-purpose MCP interface, yup.

Installation

Fork and download, cd to the repository.

uv venv ./venv/Scripts/activate pip install -e .

Put this in your claude_desktop_config.json with your own paths:

"mcp-windows-website-downloader": { "command": "uv", "args": [ "--directory", "F:/GithubRepos/mcp-windows-website-downloader", "run", "mcp-windows-website-downloader", "--library", "F:/GithubRepos/mcp-windows-website-downloader/website_library" ] },

alt text

Other Usage you don't need to worry about and may be hallucinatory lol:

  1. Start the server:
python -m mcp_windows_website_downloader.server --library docs_library
  1. Use through Claude Desktop or other MCP clients:
result = await server.call_tool("download", { "url": "https://docs.example.com" })

Output Structure

docs_library/ domain_name/ index.html about.html docs/ getting-started.html ... assets/ css/ js/ images/ fonts/ rag_index.json

Development

The server follows standard MCP architecture:

src/ mcp_windows_website_downloader/ __init__.py server.py # MCP server implementation core.py # Core downloader functionality utils.py # Helper utilities

Components

  • server.py: Main MCP server implementation that handles tool registration and requests
  • core.py: Core website downloading functionality with proper asset handling
  • utils.py: Helper utilities for file handling and URL processing

Design Principles

  1. Single Responsibility
    • Each module has one clear purpose
    • Server handles MCP interface
    • Core handles downloading
    • Utils handles common operations
  2. Clean Structure
    • Maintains original site structure
    • Organizes assets by type
    • Creates clear index for RAG systems
  3. Robust Operation
    • Proper error handling
    • Reasonable depth limits
    • Asset download verification
    • Clean URL/path processing

RAG Index

The rag_index.json file contains:

{ "url": "https://docs.example.com", "domain": "docs.example.com", "pages": 42, "path": "/path/to/site" }

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

License

MIT License - See LICENSE file

Error Handling

The server handles common issues:

  • Invalid URLs
  • Network errors
  • Asset download failures
  • Malformed HTML
  • Deep recursion
  • File system errors

Error responses follow the format:

{ "status": "error", "error": "Detailed error message" }

Success responses:

{ "status": "success", "path": "/path/to/downloaded/site", "pages": 42 }
Deploy Server
A
security – no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

local-only server

The server can only run on the client's local machine because it depends on local resources.

该服务器使用户能够下载整个网站及其资产以供离线访问,支持可配置的深度和并发设置。

  1. 特征
    1. 安装
      1. 其他用法你不必担心,可能会产生幻觉,哈哈:
        1. 输出结构
          1. 发展
            1. 成分
            2. 设计原则
            3. RAG指数
          2. 贡献
            1. 执照
              1. 错误处理

                Related MCP Servers

                • A
                  security
                  A
                  license
                  A
                  quality
                  This server enables LLMs to retrieve and process content from web pages, converting HTML to markdown for easier consumption.
                  Last updated -
                  1
                  66,803
                  MIT License
                  • Linux
                  • Apple
                • A
                  security
                  F
                  license
                  A
                  quality
                  Provides a tool to download entire websites using wget. It preserves the website structure and converts links to work locally.
                  Last updated -
                  1
                  133
                  • Apple
                  • Linux
                • A
                  security
                  A
                  license
                  A
                  quality
                  An advanced web browsing server enabling headless browser interactions via a secure API, providing features like navigation, content extraction, element interaction, and screenshot capture.
                  Last updated -
                  6
                  21
                  MIT License
                • -
                  security
                  F
                  license
                  -
                  quality
                  This server provides an interface for performing basic file system operations such as navigation, reading, writing, and file analysis, allowing users to manage directories and files efficiently.
                  Last updated -
                  4

                View all related MCP servers

                MCP directory API

                We provide all the information about MCP servers via our MCP API.

                curl -X GET 'https://glama.ai/api/mcp/v1/servers/angrysky56/mcp-windows-website-downloader'

                If you have feedback or need assistance with the MCP directory API, please join our Discord server