Allows crawling, indexing, and searching through documentation hosted on Docusaurus sites.
Enables indexing and searching of technical documentation published via GitBook.
Enables the inclusion of Next.js documentation into local search collections for project-specific contexts.
Facilitates the indexing and semantic searching of React documentation, including support for multiple versions.
Supports crawling and indexing of UI component documentation from Storybook sites.
Supports the indexing and retrieval of TypeScript documentation and reference materials.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP Web Docssearch for how to implement server actions in the Next.js docs"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MCP Web Docs
Index Any Documentation. Search Locally. Stay Private.
A self-hosted Model Context Protocol (MCP) server that crawls, indexes, and searches documentation from any website. Unlike remote MCP servers limited to GitHub repos or pre-indexed libraries, web-docs gives you full control over what gets indexed — including private documentation behind authentication.
Features • Installation • Quick Start • Tools • Tips • Troubleshooting • Contributing
❌ The Problem
AI assistants struggle with documentation:
❌ Remote MCP servers only work with GitHub or pre-indexed libraries
❌ Private docs behind authentication can't be accessed
❌ Outdated indexes don't reflect your team's latest documentation
❌ No control over what gets indexed or when
✅ The Solution
MCP Web Docs crawls and indexes documentation from ANY website locally:
✅ Any website - Docusaurus, Storybook, GitBook, custom sites, internal wikis
✅ Private docs - Interactive browser login for authenticated sites
✅ Always fresh - Re-index anytime with one command
✅ Your data, your machine - No API keys, no cloud, full privacy
✨ Features
🌐 Universal Crawler - Works with any documentation site, not just GitHub
🔍 Hybrid Search - Combines full-text search (FTS) with semantic vector search
📂 Collections - Group related docs into named collections for project-based organization
🏷️ Tags & Categories - Organize docs with tags and filter searches by project, team, or category
📦 Version Support - Index multiple versions of the same package (e.g., React 18 and 19)
🔐 Authentication Support - Crawl private/protected docs with interactive browser login (auto-detects your default browser)
📊 Smart Extraction - Automatically extracts code blocks, props tables, and structured content
⚡ Local Embeddings - Uses FastEmbed for fast, private embedding generation (no API keys)
🗄️ Persistent Storage - LanceDB for vectors, SQLite for metadata
🔄 Real-time Progress - Track indexing status with progress updates
🚀 Installation
Prerequisites
Node.js >= 22.19.0
Option 1: Install from NPM (Recommended)
Option 2: Run with npx
No installation required - just configure your MCP client to use npx (see below).
Option 3: Build from Source
Configure Your MCP Client
Add to your Cursor MCP settings (~/.cursor/mcp.json):
Using npx (no install required):
Using global install:
Using local build:
Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):
Using npx:
Using global install:
Add to .vscode/mcp.json in your workspace:
Using npx:
Using global install:
Add to ~/.codeium/windsurf/mcp_config.json:
Using npx:
Using global install:
Add to ~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json:
Using npx:
Using global install:
Global configuration: Open RooCode → Click MCP icon → "Edit Global MCP"
Project-level configuration: Create .roo/mcp.json at your project root
Using npx:
Using global install:
⚡ Quick Start
1. Index public documentation
The AI assistant will call add_documentation and begin crawling.
2. Search for information
The AI will use search_documentation to find relevant content.
3. For private docs, authenticate first
A browser window will open for you to log in. The session is saved for future crawls.
🔨 Available Tools
add_documentation
Add a new documentation site for indexing.
search_documentation
Search through indexed documentation using hybrid search (FTS + semantic).
authenticate
Open a browser window for interactive login to protected sites. Your default browser is automatically detected from OS settings.
list_documentation
List all indexed documentation sites with their metadata including tags.
set_tags
Set or update tags for a documentation site. Tags help categorize and filter documentation.
list_tags
List all available tags with usage counts. Useful to see what tags exist across your indexed docs.
reindex_documentation
Re-crawl and re-index a specific documentation site.
get_indexing_status
Get the current status of indexing operations.
delete_documentation
Delete an indexed documentation site and all its data.
clear_auth
Clear saved authentication session for a domain.
Collection Tools
Collections let you group related documentation sites together for project-based organization. Unlike tags (which categorize individual docs), collections create named workspaces like "My React Project" containing React + Next.js + TypeScript docs.
create_collection
Create a new collection to group documentation sites.
add_to_collection
Add indexed documentation sites to a collection.
search_collection
Search within a specific collection. Uses the same hybrid search as search_documentation but limited to docs in the collection.
list_collections
List all collections with their document counts.
get_collection
Get details of a specific collection including all its documentation sites.
update_collection
Rename a collection or update its description.
remove_from_collection
Remove documentation sites from a collection. The sites remain indexed, just removed from the collection.
delete_collection
Delete a collection. The documentation sites in the collection are not deleted, only the collection grouping.
💡 Tips
Crafting Better Search Queries
The search uses hybrid full-text and semantic search. For best results:
Be specific - Include unique terms from what you're looking for
Instead of:
"Button props"Try:
"Button props onClick disabled loading"
Use exact phrases - Wrap in quotes for exact matching
"authentication middleware"finds that exact phrase
Include context - Add related terms to narrow results
API docs:
"GET /users endpoint authentication headers"Config:
"webpack config entry output plugins"
Auto-Invoke with Rules
To avoid typing search instructions in every prompt, add a rule to your MCP client:
Cursor (Cursor Settings > Rules):
Windsurf (.windsurfrules):
Scoping Searches
If you have multiple sites indexed, filter by URL or tags:
Organizing with Tags
Tags help organize documentation when you have multiple related sites. Add tags when indexing:
Later, search across all frontend docs:
You can also add tags to existing documentation with set_tags.
Using Collections for Project Organization
Collections provide a higher-level grouping than tags — they let you organize documentation by project or context, making it easy to switch between different work contexts.
Create a collection for your project:
Add relevant documentation:
Search within your project context:
Collections vs Tags:
Feature | Collections | Tags |
Purpose | Group docs as a project/workspace | Categorize individual docs |
Structure | Named container with multiple docs | Labels on individual docs |
Use case | "My React Project" with React + Next.js + TS | "This doc is about React" |
Searching |
|
|
You can use both together — a document can have tags AND belong to multiple collections.
Versioning Package Documentation
When indexing documentation for versioned packages (React, Vue, Python libraries, etc.), you can specify the version to track which version you've indexed:
The version is displayed in list_documentation output and preserved when re-indexing. Version formats are flexible — use whatever makes sense for your package (e.g., "18", "v6.4", "3.11", "latest").
Note: Version is optional and mainly useful for software packages with multiple versions. For internal documentation, wikis, or single-version products, you can skip the version field.
🚨 Troubleshooting
The content extractor couldn't process the page. Try:
Re-indexing the documentation
Checking if the site uses JavaScript rendering (should work with Playwright)
Looking at the crawled data in
~/.mcp-web-docs/crawlee/datasets/
Make sure you call
authenticatebeforeadd_documentationThe browser window needs to stay open until login is detected
For OAuth sites, complete the full flow manually
Your default browser is auto-detected; specify a different one with
browser: "firefox", for example, if needed
Try more specific queries with unique terms
Use quotes for exact phrase matching
Filter by URL to search within a specific documentation site
Re-index if the documentation has been updated
If browsers aren't installed, run:
Data Storage
All data is stored locally in ~/.mcp-web-docs/:
📄 License
MIT License - see LICENSE for details.
🙏 Acknowledgments
Model Context Protocol - The protocol specification
Crawlee - Web scraping and browser automation
LanceDB - Vector database
FastEmbed - Local embedding generation
Playwright - Browser automation