Skip to main content
Glama

mcp-for-docs

GitHub Status Platform License Version

An MCP (Model Context Protocol) server that automatically downloads and converts documentation from various sources into organized markdown files.

Overview

mcp-for-docs is designed to crawl documentation websites, convert their content to markdown format, and organize them in a structured directory system. It can also generate condensed cheat sheets from the downloaded documentation.

Features

  • ๐Ÿ•ท๏ธ Smart Documentation Crawler: Automatically crawls documentation sites with configurable depth

  • ๐Ÿ“ HTML to Markdown Conversion: Preserves code blocks, tables, and formatting

  • ๐Ÿ“ Automatic Categorization: Intelligently organizes docs into tools/APIs categories

  • ๐Ÿ“„ Cheat Sheet Generator: Creates condensed reference guides from documentation

  • ๐Ÿ” Smart Discovery System: Automatically detects existing documentation before crawling

  • ๐Ÿš€ Local-First: Uses existing downloaded docs when available

  • โšก Rate Limiting: Respects server limits and robots.txt

  • โœ… User Confirmation: Prevents accidental regeneration of existing content

  • โš™๏ธ Comprehensive Configuration: JSON-based configuration with environment variable overrides

  • ๐Ÿงช Test Suite: 94 tests covering core functionality

Installation

Prerequisites

  • Node.js 18+

  • npm or yarn

  • Claude Desktop or Claude Code CLI

Setup

  1. Clone the repository:

git clone https://github.com/shayonpal/mcp-for-docs.git cd mcp-for-docs
  1. Install dependencies:

npm install
  1. Build the project:

npm run build
  1. Add to your MCP configuration:

For Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{ "mcpServers": { "mcp-for-docs": { "command": "node", "args": ["/path/to/mcp-for-docs/dist/index.js"], "env": {} } } }

For Claude Code CLI (~/.claude.json):

{ "mcpServers": { "mcp-for-docs": { "command": "node", "args": ["/path/to/mcp-for-docs/dist/index.js"], "env": {} } } }

Usage

Crawling Documentation

To download documentation from a website:

await crawl_documentation({ url: "https://docs.n8n.io/", max_depth: 3, // Optional, defaults to 3 force_refresh: false // Optional, set to true to regenerate existing docs });

The tool will first check for existing documentation and show you what's already available. To regenerate existing content, use force_refresh: true.

The documentation will be saved to:

  • Tools: /Users/shayon/DevProjects/~meta/docs/tools/[tool-name]/

  • APIs: /Users/shayon/DevProjects/~meta/docs/apis/[api-name]/

Generating Cheat Sheets

To create a cheat sheet from documentation:

await generate_cheatsheet({ url: "https://docs.anthropic.com/", use_local: true, // Use local files if available (default) force_regenerate: false // Optional, set to true to regenerate existing cheatsheets });

Cheat sheets are saved to: /Users/shayon/DevProjects/~meta/docs/cheatsheets/

The tool will check for existing cheatsheets and show you what's already available. To regenerate existing content, use force_regenerate: true.

Listing Downloaded Documentation

To see what documentation is available locally:

await list_documentation({ category: "all", // Options: "tools", "apis", "all" include_stats: true });

Supported Documentation Sites

The server has been tested with:

  • n8n documentation

  • Anthropic API docs

  • Obsidian Tasks plugin docs

  • Apple Swift documentation

Most documentation sites following standard patterns should work automatically.

Recent Updates

  • Configuration System (v0.4.0): Added comprehensive JSON-based configuration with environment variable support

  • Smart Discovery: Automatically finds and reports existing documentation before crawling

  • Improved Conversion: Fixed HTML to Markdown issues including table formatting and inline code preservation

  • Dynamic Categorization: Intelligent detection of tools vs APIs based on URL patterns and content analysis

  • Test Coverage: 94 tests passing with comprehensive unit and integration testing

For detailed changes, see CHANGELOG.md.

Configuration

Initial Setup

  1. Copy the example configuration:

cp config.example.json config.json
  1. Edit config.json and update the docsBasePath for your machine:

{ "docsBasePath": "/Users/yourusername/path/to/docs" }

Important: The config.json file is tracked in git. When you clone this repository on a different machine, you'll need to update the docsBasePath to match that machine's directory structure.

How Documentation Organization Works

The tool automatically organizes documentation based on content analysis:

  1. You provide a URL when calling the tool (e.g., https://docs.n8n.io)

  2. The categorizer analyzes the content and determines if it's:

    • tools/ - Software tools, applications, plugins

    • apis/ - API references, SDK documentation

  3. Documentation is saved to: {docsBasePath}/{category}/{tool-name}/

For example:

  • https://docs.n8n.io โ†’ /Users/shayon/DevProjects/~meta/docs/tools/n8n/

  • https://docs.anthropic.com โ†’ /Users/shayon/DevProjects/~meta/docs/apis/anthropic/

This happens automatically - you don't need to configure anything per-site!

Configuration Options

Setting

Description

Default

docsBasePath

Where to store all documentation

Required - no default

crawler.defaultMaxDepth

How many levels deep to crawl

3

crawler.defaultRateLimit

Requests per second

2

crawler.pageTimeout

Page load timeout (ms)

30000

crawler.userAgent

Browser identification

MCP-for-docs/1.0

cheatsheet.maxLength

Max characters in cheatsheet

10000

cheatsheet.filenameSuffix

Append to cheatsheet names

-Cheatsheet.md

Multi-Machine Setup

Since config.json is tracked in git:

  1. First machine: Set your docsBasePath and commit

  2. Other machines: After cloning, update docsBasePath to match that machine

  3. Use environment variable to override without changing the file:

    export DOCS_BASE_PATH="/different/path/on/this/machine"

Development

# Install dependencies npm install # Run in development mode npm run dev # Run tests npm test # Build for production npm run build # Lint code npm run lint

Architecture

  • Crawler: Uses Playwright for JavaScript-rendered pages

  • Parser: Extracts content using configurable selectors

  • Converter: Turndown library with custom rules for markdown

  • Categorizer: Smart detection of tools vs APIs

  • Storage: Organized file system structure

Known Issues

  • URL Structure Preservation (#15): Currently flattens URL structure when saving docs

  • Large Documentation Sites (#14): No document limit for very large sites

  • GitHub Repository Docs (#9): Specialized crawler for GitHub repos not yet implemented

See all open issues for the complete roadmap.

Contributing

  1. Fork the repository

  2. Create a feature branch

  3. Make your changes

  4. Update CHANGELOG.md

  5. Submit a pull request

License

This project is licensed under the GPL 3.0 License - see the LICENSE file for details.

Acknowledgments

A
security โ€“ no known vulnerabilities
A
license - permissive license
A
quality - confirmed to work

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/shayonpal/mcp-for-docs'

If you have feedback or need assistance with the MCP directory API, please join our Discord server