mintlify-mcp

architecture-plan.md•8.01 KiB

# Architecture Plan Research and planning document for docmole architecture evolution. ## Project Goal Create MCPs for any documentation site, publishable to npm, with potential for submodule integration. ## Current Architecture ``` User → bunx docmole → TypeScript MCP → Python RAG Server → Knowledge Base ``` ### Current Modes 1. **Mintlify API Mode** (`-p <project-id>`) - Proxies to Mintlify's AI Assistant API - Works for sites with built-in assistant - Zero setup, fast response 2. **Local RAG Mode** (`setup` + `serve`) - Requires Python server (Agno) - Self-hosted embeddings and vector store - Complex setup ### Problems | Issue | Impact | |-------|--------| | Python dependency | Not npm-friendly, complex install | | Cold start (~10s) | Poor UX on first request | | Multiple servers | Resource heavy for multiple projects | | Setup complexity | Barrier to adoption | --- ## Proposed Architecture ``` docmole (npm) │ ├── 🚀 Remote Mode (default) │ └── Mintlify API (sites with AI Assistant) │ - Zero setup │ - Response ~200ms │ - Already implemented │ ├── 📦 Embedded Mode (new) │ └── JavaScript vector store (vectra/hnswlib-node) │ - NO Python required │ - Uses OpenAI Embeddings API │ - Vectors stored in ~/.docmole/vectors/<project>/ │ - Single process, npm-ready │ └── 🔧 External Mode (advanced/submodule) └── Connects to external RAG server - For custom integrations - Submodule in monorepos - Self-hosted deployments ``` --- ## Mode Comparison | Aspect | Remote | Embedded | External | |--------|--------|----------|----------| | Setup | Zero | `setup --url` | Manual | | Dependencies | None | OpenAI API key | Python/custom server | | npm-friendly | ✅ | ✅ | ❌ | | Cold start | ~200ms | ~500ms | ~10s | | Offline capable | ❌ | ❌ (needs API) | ✅ | | Submodule ready | ❌ | Partial | ✅ | | Resource usage | Zero | Low | High | --- ## Embedded Mode Design ### Technology Options | Library | Pros | Cons | |---------|------|------| | **vectra** | Simple API, TypeScript native | Newer, less battle-tested | | **hnswlib-node** | Fast, proven algorithm | Native bindings, build issues | | **faiss-node** | Very fast, Facebook-backed | Complex setup, native deps | | **chromadb** | Full-featured | Requires server | **Recommendation**: `vectra` for simplicity and TypeScript compatibility. ### Data Flow ``` setup --url <docs-url> --id <project> │ ├── 1. Discover pages (sitemap/mint.json) ├── 2. Fetch markdown content ├── 3. Chunk documents ├── 4. Generate embeddings (OpenAI API) └── 5. Store vectors locally (~/.docmole/vectors/<project>/) serve --project <project> │ ├── 1. Load vectors from disk ├── 2. Start MCP server └── 3. On query: ├── Generate query embedding ├── Search vectors (similarity) ├── Retrieve top-k chunks └── Generate response (LLM) ``` ### File Structure ``` ~/.docmole/ ├── projects/ │ └── <project-id>/ │ ├── config.yaml │ └── vectors/ # New: embedded vector store │ ├── index.json │ └── vectors.bin └── global.yaml ``` ### API Design ```typescript // Backend interface (unchanged) interface Backend { ask(question: string): Promise<AskResult>; clearHistory(): void; isAvailable(): Promise<boolean>; } // New embedded backend class EmbeddedBackend implements Backend { private vectorStore: VectraIndex; private openai: OpenAI; async ask(question: string): Promise<AskResult> { // 1. Embed query const queryVector = await this.embed(question); // 2. Search similar chunks const results = await this.vectorStore.query(queryVector, { topK: 5 }); // 3. Build context const context = results.map(r => r.text).join('\n\n'); // 4. Generate response const response = await this.openai.chat.completions.create({ model: 'gpt-4o-mini', messages: [ { role: 'system', content: `Answer based on this documentation:\n${context}` }, { role: 'user', content: question } ] }); return { answer: response.choices[0].message.content }; } } ``` --- ## Implementation Phases ### Phase 1: npm Release (Embedded Mode) **Goal**: Self-contained npm package, no Python dependency. **Tasks**: - [ ] Add `vectra` dependency - [ ] Implement `EmbeddedBackend` - [ ] Implement chunking strategy - [ ] Update `setup` command to use embedded mode by default - [ ] Update `serve` command to auto-detect mode - [ ] Add `OPENAI_API_KEY` requirement for embedded mode - [ ] Test with multiple documentation sites - [ ] Publish to npm **CLI Changes**: ```bash # Embedded mode (new default for local) bunx docmole setup --url https://docs.example.com --id my-docs # Explicit mode selection bunx docmole setup --url https://docs.example.com --id my-docs --mode embedded bunx docmole setup --url https://docs.example.com --id my-docs --mode external ``` ### Phase 2: Optimization **Goal**: Improve performance and reduce API costs. **Tasks**: - [ ] Cache embeddings to avoid re-computation - [ ] Implement incremental updates (only new/changed pages) - [ ] Add local embedding model option (transformers.js) - [ ] Optimize chunk size and overlap - [ ] Add compression for vector storage ### Phase 3: Submodule Support **Goal**: Enable integration as submodule in larger projects. **Tasks**: - [ ] Expose programmatic API - [ ] Support custom vector stores - [ ] Support custom LLM providers - [ ] Add webhook/event system - [ ] Document integration patterns --- ## Backend Selection Logic ```typescript async function createBackend(projectId: string): Promise<Backend> { const config = await loadProjectConfig(projectId); // 1. Check for Mintlify API (fastest) if (config.backend === 'mintlify' && config.mintlify) { return createMintlifyBackend(config.mintlify.project_id, config.mintlify.domain); } // 2. Check for embedded vectors (npm-friendly) if (config.backend === 'embedded' || await hasLocalVectors(projectId)) { return createEmbeddedBackend(projectId); } // 3. Check for external server (advanced) if (config.backend === 'external' && config.external) { return createExternalBackend(config.external.url); } // 4. Auto-detect Mintlify API availability if (await hasMintlifyAssistant(config.source.url)) { return createMintlifyBackend(/* auto-discover */); } throw new Error('No backend available. Run setup first.'); } ``` --- ## Cost Considerations ### Embedded Mode Costs (OpenAI API) | Operation | Model | Cost | |-----------|-------|------| | Embedding | text-embedding-3-small | $0.02 / 1M tokens | | Query | gpt-4o-mini | $0.15 / 1M input, $0.60 / 1M output | **Example**: 100 doc pages (~500 tokens each) - Initial embedding: 50K tokens = $0.001 - Per query: ~2K tokens = $0.0003 ### Optimization Strategies 1. **Cache embeddings**: Only re-embed changed pages 2. **Batch requests**: Embed multiple chunks per API call 3. **Local embeddings**: Use transformers.js for free embeddings (slower) 4. **Hybrid**: Use local for embedding, API for generation --- ## Open Questions 1. **Chunking strategy**: How to split documents effectively? - By headers? Fixed size? Semantic? 2. **Embedding model**: - `text-embedding-3-small` (cheap, good) vs `text-embedding-3-large` (better, costly) 3. **Local LLM option**: - Should we support Ollama/local models? 4. **Multi-project optimization**: - Share embedding model across projects? - Global vector store with project namespacing? --- ## Enterprise Deployment For production deployments in companies with private networks, see: @docs/enterprise-architecture.md --- ## References - [Vectra](https://github.com/Stevenic/vectra) - Local vector database for TypeScript - [OpenAI Embeddings](https://platform.openai.com/docs/guides/embeddings) - [RAG Best Practices](https://www.anthropic.com/research/rag)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Vigtu/mintlify-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

architecture-plan.md•8.01 KiB