Crawl4Claude

MCP_README.md•12.6 kB

# MCP Documentation Server A **Model Context Protocol (MCP) server** that provides AI agents with real-time, searchable access to any documentation database. This server is **completely domain-agnostic** and automatically integrates with Claude Desktop through our configuration-driven setup. ## 🚀 Features - **🌐 Universal Documentation Access**: Works with any scraped documentation site - **🔍 Full-text Search**: Fast FTS5 search with highlighting and snippets - **📚 Section Browsing**: Organized access to documentation sections - **📄 Complete Page Retrieval**: Full content access with metadata - **⚙️ Configuration-Driven**: Single config file controls everything - **🔧 Auto-Setup**: Automatic Claude Desktop configuration generation - **⚡ High Performance**: Optimized SQLite database with indexing - **🛠️ Debug Suite**: Comprehensive testing and validation tools ## 🛠️ Quick Setup ### 1. Configure Your Documentation Edit `config.py` to set your target documentation: ```python SCRAPER_CONFIG = { "base_url": "https://docs.example.com/", "output_dir": "docs_db", "max_pages": 200, } MCP_CONFIG = { "server_name": "docs-server", "server_description": "Documentation Search Server", "default_search_limit": 10, "max_search_limit": 50, } ``` ### 2. Scrape Your Documentation ```bash # Scrape the documentation site python docs_scraper.py # Verify the database was created python query_docs.py --stats ``` ### 3. Generate MCP Configuration ```bash # Auto-generate Claude Desktop config files python utils/gen_mcp.py ``` ### 4. Test the MCP Server ```bash # Test server functionality python utils/debug_mcp_server.py # Test MCP tools directly python utils/debug_mcp_client.py # Test MCP protocol python utils/debug_mcp_server_protocol.py ``` ### 5. Connect to Claude Desktop 1. **Copy the generated configuration** from `mcp/claude_mcp_config.json` 2. **Add it to Claude Desktop config**: - **Windows**: `%APPDATA%\Claude\claude_desktop_config.json` - **macOS**: `~/Library/Application Support/Claude/claude_desktop_config.json` 3. **Restart Claude Desktop** ## 🔌 Claude Desktop Integration ### Automatic Configuration Our `gen_mcp.py` tool creates everything you need: ```bash python utils/gen_mcp.py ``` **Generated files:** - `mcp/run_mcp_server.bat` - Windows launcher script - `mcp/claude_mcp_config.json` - Claude Desktop configuration **Example generated config:** ```json { "mcpServers": { "docs_example_com": { "command": "C:\\path\\to\\mcp\\run_mcp_server.bat", "args": [], "cwd": "C:\\path\\to\\project", "env": { "DOCS_DB_PATH": "C:\\path\\to\\docs_db\\documentation.db", "DOCS_DB_NAME": "docs.example.com", "DOCS_BASE_URL": "https://docs.example.com/", "MCP_SERVER_NAME": "docs-server" } } } } ``` ### Manual Configuration If you prefer manual setup: ```json { "mcpServers": { "docs": { "command": "python", "args": ["mcp_docs_server.py"], "cwd": "/path/to/project", "env": { "DOCS_DB_PATH": "/path/to/docs_db/documentation.db", "DOCS_DB_NAME": "Your Documentation", "DOCS_BASE_URL": "https://docs.yoursite.com/" } } } } ``` ## 🧰 Available MCP Tools Once connected, Claude can use these tools: ### 🔍 `search_documentation` Search through documentation content with full-text search. ```javascript // Search with basic query search_documentation({ query: "authentication guide", limit: 10 }) // Search within specific section search_documentation({ query: "API endpoints", limit: 5, section: "api-reference" }) ``` **Returns:** Array of pages with title, URL, section, word count, and highlighted snippets. ### 📚 `get_documentation_sections` Get all available sections with statistics. ```javascript get_documentation_sections() ``` **Returns:** `[{"section": "tutorials", "page_count": 45, "total_words": 18500}, ...]` ### 📄 `get_page_content` Retrieve the full content of a specific page. ```javascript get_page_content({ url: "https://docs.example.com/tutorials/getting-started" }) ``` **Returns:** Complete page with title, markdown content, section, word count, and metadata. ### 🗂️ `browse_section` Browse all pages in a specific section. ```javascript browse_section({ section: "tutorials", limit: 20 }) ``` **Returns:** Array of pages in the section, ordered by relevance. ### 📊 `get_documentation_stats` Get overall database statistics and server info. ```javascript get_documentation_stats() ``` **Returns:** Total pages, words, sections, top sections, and server configuration. ## 🌐 Multi-Site Examples ### Python Documentation ```python # config.py SCRAPER_CONFIG = { "base_url": "https://docs.python.org/3/", "output_dir": "python_docs", "max_pages": 1000, } MCP_CONFIG = { "server_name": "python-docs-server", "docs_display_name": "Python 3 Documentation", } ``` ### React Documentation ```python # config.py SCRAPER_CONFIG = { "base_url": "https://react.dev/", "output_dir": "react_docs", "max_pages": 300, } MCP_CONFIG = { "server_name": "react-docs-server", "docs_display_name": "React Documentation", } ``` ### Corporate Documentation ```python # config.py SCRAPER_CONFIG = { "base_url": "https://internal-docs.company.com/", "output_dir": "company_docs", "max_pages": 500, } MCP_CONFIG = { "server_name": "company-docs-server", "docs_display_name": "Company Internal Docs", } ``` ## 🔧 Command Line Interface ### Run MCP Server Directly ```bash # Start the MCP server python mcp_docs_server.py # With debug output python utils/debug_mcp_docs_server.py ``` ### Query Documentation Locally ```bash # Search documentation python query_docs.py --search "tutorial example" # Browse sections python query_docs.py --section "getting-started" # Get statistics python query_docs.py --stats # Export sections python query_docs.py --export-section "api" --format markdown > api_docs.md ``` ## 🧪 Testing & Debugging ### Comprehensive Test Suite ```bash # Test scraper functionality (5-page test) python utils/debug_scraper.py # Test MCP server locally python utils/debug_mcp_server.py # Test all MCP tools directly python utils/debug_mcp_client.py # Test MCP JSON-RPC protocol python utils/debug_mcp_server_protocol.py # Test content extraction python utils/debug_site_content.py # Generate/regenerate MCP configs python utils/gen_mcp.py ``` ### Debugging Checklist **✅ Scraper Issues:** - Database exists: `ls docs_db/documentation.db` - Database has content: `python query_docs.py --stats` - Test scraping: `python utils/debug_scraper.py` **✅ MCP Server Issues:** - Configuration valid: `python utils/debug_mcp_server.py` - Tools working: `python utils/debug_mcp_client.py` - Protocol working: `python utils/debug_mcp_server_protocol.py` **✅ Claude Connection Issues:** - Config file syntax valid - Paths are absolute and correct - Server starts without errors - Environment variables set correctly ## 🔍 Example Claude Interactions Once connected, you can ask Claude: ### Documentation Search > **User:** "Search for authentication examples in the documentation" > > **Claude:** *Uses `search_documentation` to find auth-related content* ### Section Exploration > **User:** "What sections are available in this documentation?" > > **Claude:** *Uses `get_documentation_sections` to list all sections* ### Deep Content Access > **User:** "Show me the full content of the getting started guide" > > **Claude:** *Uses `search_documentation` to find the guide, then `get_page_content` to get full text* ### Section Analysis > **User:** "How many tutorial pages are there and what do they cover?" > > **Claude:** *Uses `browse_section` to analyze the tutorials section* ### Documentation Overview > **User:** "Give me an overview of this documentation database" > > **Claude:** *Uses `get_documentation_stats` to provide comprehensive statistics* ## 📊 Database Schema The MCP server works with any documentation database following this schema: ```sql -- Main content table CREATE TABLE pages ( id INTEGER PRIMARY KEY AUTOINCREMENT, url TEXT UNIQUE NOT NULL, title TEXT, content TEXT, -- Raw HTML content markdown TEXT, -- Clean markdown content word_count INTEGER, section TEXT, -- Main section (e.g., "tutorials") subsection TEXT, -- Subsection (e.g., "advanced") scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, metadata TEXT -- JSON metadata ); -- Full-text search (optional but recommended) CREATE VIRTUAL TABLE pages_fts USING fts5( title, markdown, url, section, content='pages', content_rowid='id' ); -- Indexes for performance CREATE INDEX idx_pages_section ON pages(section); CREATE INDEX idx_pages_url ON pages(url); ``` ## 🚀 Multiple Documentation Servers Run multiple MCP servers for different documentation sets: ```json { "mcpServers": { "python_docs": { "command": "C:\\path\\to\\python_mcp\\run_mcp_server.bat", "cwd": "C:\\path\\to\\python_project" }, "react_docs": { "command": "C:\\path\\to\\react_mcp\\run_mcp_server.bat", "cwd": "C:\\path\\to\\react_project" }, "company_docs": { "command": "C:\\path\\to\\company_mcp\\run_mcp_server.bat", "cwd": "C:\\path\\to\\company_project" } } } ``` Each server operates independently with its own configuration and database. ## ⚙️ Configuration Reference ### Environment Variables (Optional Overrides) ```bash # Override database path DOCS_DB_PATH=/custom/path/documentation.db # Override display name DOCS_DB_NAME="Custom Documentation Name" # Override base URL DOCS_BASE_URL=https://different-docs.com/ # Override server name MCP_SERVER_NAME=custom-docs-server ``` ### Config.py Settings ```python MCP_CONFIG = { # Server identification "server_name": "docs-server", "server_description": "Documentation Search and Retrieval Server", # Display name (None = auto-derive from base_url) "docs_display_name": None, # Search limits "default_search_limit": 10, "max_search_limit": 50, "default_section_limit": 20, "max_section_limit": 100, # Content settings "include_full_urls": True, "snippet_length": 32, "enable_fts_fallback": True, } ``` ## 🔧 Troubleshooting ### Common Issues **❌ "Database not found"** ```bash # Check if database exists ls docs_db/documentation.db # Re-run scraper if missing python docs_scraper.py ``` **❌ "No search results"** ```bash # Test database content python query_docs.py --stats # Test search functionality python utils/debug_mcp_client.py ``` **❌ "Claude can't connect"** ```bash # Test MCP protocol python utils/debug_mcp_server_protocol.py # Check config syntax python -m json.tool mcp/claude_mcp_config.json ``` **❌ "Import errors"** ```bash # Install dependencies pip install -r requirements.txt # Test imports python -c "import fastmcp; print('FastMCP OK')" ``` ### Debug Logs Check logs for detailed error information: ```bash # Scraper logs tail -f docs_db/scraper.log # MCP server logs (stderr) python utils/debug_mcp_docs_server.py 2> mcp_debug.log ``` ## 📝 Contributing Areas for enhancement: - **Additional export formats** (PDF, EPUB) - **Enhanced search algorithms** (semantic search, relevance scoring) - **Real-time documentation updates** (webhook integration) - **Multi-language support** (internationalized documentation) - **Performance optimizations** (caching, connection pooling) ## 📄 License This MCP server is designed to be universally reusable. Adapt and extend it for any documentation project while respecting the original documentation sources' licenses. --- **🤖 Ready to enhance your AI workflows with comprehensive documentation access!** For support, check the debug tools first, then create an issue with relevant logs and configuration details.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dragomirweb/Crawl4Claude'

If you have feedback or need assistance with the MCP directory API, please join our Discord server