Documentation MCP Server

README.md•6.12 KiB

# Documentation MCP Server (RAG-enabled) An MCP server that provides AI assistants with **semantic search** access to any documentation. Uses RAG (Retrieval-Augmented Generation) with vector embeddings to find relevant content based on meaning, not just keywords. ## Features - **Semantic Search** - Find relevant content by meaning, not just keyword matching - **Recursive Chunking** - Documents split by headers (H1 → H2 → H3) for precise retrieval - **Low Token Usage** - Returns ~500-1000 tokens per query instead of 15k+ full documents - **Vector Embeddings** - Uses OpenRouter (text-embedding-3-small) via Apify proxy - **PostgreSQL + pgvector** - Supabase for scalable vector storage ## Architecture ``` Scrape → Parse HTML → Recursive Chunk → Generate Embeddings → Store in Supabase ↓ User Query → Embed Query → Vector Similarity Search → Return Relevant Chunks ``` ## Available Tools | Tool | Description | Token Usage | |------|-------------|-------------| | `list_docs` | List all docs with metadata (id, title, summary, sections) | ~200 | | `search` | Semantic search across all docs | ~500-1000 | | `get_doc_overview` | Get document summary and section structure | ~200 | | `get_section` | Get content of a specific section | ~500-1000 | | `get_chunks` | Paginated access to all chunks in a doc | ~500-1000 | | `search_in_doc` | Semantic search within a specific document | ~500-1000 | ## Setup ### 1. Create Supabase Project 1. Go to [supabase.com](https://supabase.com) and create a project 2. Go to **SQL Editor** and run the schema below 3. Copy your **Project URL** and **anon key** from Settings → API ### 2. Supabase SQL Schema Run this in your Supabase SQL Editor: ```sql -- Enable pgvector extension create extension if not exists vector; -- Document metadata table create table doc_metadata ( id text primary key, title text not null, url text not null, summary text, sections text[], total_chunks integer, type text, created_at timestamp default now() ); -- Chunks table with vector embeddings create table doc_chunks ( id uuid primary key default gen_random_uuid(), doc_id text references doc_metadata(id) on delete cascade, doc_title text, doc_url text, section_path text[], heading text, content text not null, token_count integer, chunk_index integer, embedding vector(1536), created_at timestamp default now() ); -- Function for similarity search create or replace function search_chunks( query_embedding vector(1536), match_count int default 5, filter_doc_id text default null ) returns table ( id uuid, doc_id text, doc_title text, section_path text[], heading text, content text, similarity float ) language plpgsql as $$ begin return query select dc.id, dc.doc_id, dc.doc_title, dc.section_path, dc.heading, dc.content, 1 - (dc.embedding <=> query_embedding) as similarity from doc_chunks dc where (filter_doc_id is null or dc.doc_id = filter_doc_id) order by dc.embedding <=> query_embedding limit match_count; end; $$; -- Optional: Add index for large datasets (1000+ chunks) -- create index doc_chunks_embedding_idx -- on doc_chunks using ivfflat (embedding vector_cosine_ops) -- with (lists = 100); ``` ### 3. Configure Environment Variables Set these in your Apify Actor or use Apify secrets: ```bash # Add secrets (recommended) apify secrets add supabaseKey "your-supabase-anon-key" ``` In `.actor/actor.json`: ```json { "environmentVariables": { "SUPABASE_URL": "https://your-project.supabase.co", "SUPABASE_KEY": "@supabaseKey" } } ``` ### 4. Configure Documentation Sources | Input | Description | Default | |-------|-------------|---------| | **Start URLs** | Documentation pages to scrape | Apify SDK docs | | **Max Pages** | Maximum pages to scrape (1-1000) | 100 | | **Force Refresh** | Re-scrape even if data exists | false | Example input: ```json { "startUrls": [ { "url": "https://docs.example.com/getting-started" }, { "url": "https://docs.example.com/api-reference" } ], "maxPages": 200, "forceRefresh": false } ``` ### 5. Deploy and Connect ```bash # Push to Apify apify push # Add to Claude Code (get URL from Actor output) claude mcp add api-docs https://<YOUR_ACTOR_URL>/mcp \ --transport http \ --header "Authorization: Bearer <YOUR_APIFY_TOKEN>" ``` ## Usage Examples Once connected, your AI assistant can: ``` "Search the docs for authentication best practices" → Returns relevant chunks from multiple documents "Show me the overview of the API reference doc" → Returns summary and section list "Get the 'Getting Started' section from doc-1" → Returns specific section content "What documentation is available?" → Returns list of all docs with summaries ``` ## API Endpoints | Endpoint | Method | Description | |----------|--------|-------------| | `/` | GET | Server status and setup instructions | | `/mcp` | POST | MCP protocol endpoint | | `/refresh-docs` | POST | Re-scrape and update all documentation | | `/stats` | GET | Get document and chunk statistics | ## How Chunking Works Documents are split recursively by headers: ``` Document (15k tokens) ├── H1: Introduction (chunk 1, ~600 tokens) ├── H2: Getting Started │ ├── H3: Installation (chunk 2, ~400 tokens) │ └── H3: Configuration (chunk 3, ~500 tokens) ├── H2: API Reference │ ├── H3: Methods (chunk 4, ~700 tokens) │ └── H3: Examples (chunk 5, ~600 tokens) └── ... ``` - **Target chunk size**: 500-800 tokens - **Max chunk size**: 1000 tokens - **Min chunk size**: 100 tokens (smaller sections merged with parent) ## Pricing This Actor uses pay-per-event pricing through Apify. Costs include: - Scraping: Initial crawl of documentation - Embeddings: Generated via OpenRouter (charged to Apify account) - Tool calls: Each MCP tool invocation ## Development ```bash # Install dependencies npm install # Run locally npm run start:dev # Build npm run build # Push to Apify apify push ``` ## License ISC

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/wuyuwenj/api-doc-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•6.12 KiB