MCP Jina Supabase RAG

MIT License

mcp-jina-supabase-rag

README.md•8.94 kB

# MCP Jina Supabase RAG A lean, focused MCP server for crawling documentation websites and indexing them to Supabase for RAG (Retrieval-Augmented Generation). ## Features - **Smart URL Discovery**: Tries sitemap.xml first, falls back to Crawl4AI recursive discovery - **Hybrid Content Extraction**: Uses Jina AI for fast content extraction, Crawl4AI as fallback - **Multi-Project Support**: Index multiple documentation sites to separate Supabase projects - **Efficient Chunking**: Intelligent text chunking with configurable size and overlap - **Vector Embeddings**: OpenAI embeddings stored in Supabase pgvector ## Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ MCP Server Tools │ ├─────────────────────────────────────────────────────────────┤ │ 1. crawl_and_index(url_pattern, project_name) │ │ 2. list_projects() │ │ 3. search_documents(query, project_name, limit) │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Discovery Layer │ ├─────────────────────────────────────────────────────────────┤ │ • Try sitemap.xml (fast) │ │ • Try common doc patterns │ │ • Crawl4AI recursive discovery (fallback) │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Extraction Layer │ ├─────────────────────────────────────────────────────────────┤ │ • Jina AI Reader API (primary, fast) │ │ • Crawl4AI (fallback for complex pages) │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Chunking & Embedding Layer │ ├─────────────────────────────────────────────────────────────┤ │ • Smart text chunking │ │ • OpenAI embeddings (text-embedding-3-small) │ └─────────────────────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────┐ │ Supabase Storage │ ├─────────────────────────────────────────────────────────────┤ │ • pgvector for similarity search │ │ • Project isolation via source column │ └─────────────────────────────────────────────────────────────┘ ``` ## Installation ### Prerequisites - Python 3.12+ - [Supabase account](https://supabase.com) - [OpenAI API key](https://platform.openai.com) - [Jina AI API key](https://jina.ai) (optional, recommended) ### Setup 1. Clone the repository: ```bash git clone https://github.com/yourusername/mcp-jina-supabase-rag.git cd mcp-jina-supabase-rag ``` 2. Install dependencies: ```bash # Using uv (recommended) uv venv source .venv/bin/activate # or .venv\Scripts\activate on Windows uv pip install -e . # Or using pip pip install -e . ``` 3. Set up Supabase database: ```bash # Run the SQL in supabase_schema.sql in your Supabase SQL Editor ``` 4. Configure environment: ```bash cp .env.example .env # Edit .env with your credentials ``` ## Usage ### Running the MCP Server ```bash # SSE transport (recommended for remote connections) python src/main.py # The server will start on http://localhost:8052/sse ``` ### Configure MCP Client #### Claude Code ```bash claude mcp add --transport sse jina-supabase http://localhost:8052/sse ``` #### Cursor / Claude Desktop ```json { "mcpServers": { "jina-supabase": { "transport": "sse", "url": "http://localhost:8052/sse" } } } ``` ### Slash Command Create `/home/marty/.claude/commands/jina.md`: ```markdown --- allowed-tools: mcp__jina-supabase argument-hint: <url_pattern> <project_name> description: Crawl documentation and index to Supabase RAG --- # Index Documentation to Supabase Use the jina-supabase MCP server to crawl and index documentation. Arguments: - $1: URL pattern (e.g., https://docs.example.com/*) - $2: Project name for isolation Example: /jina https://docs.anthropic.com/claude/* anthropic-docs ``` ## Tools ### `crawl_and_index` Crawl a documentation site and index to Supabase. **Parameters:** - `url_pattern` (string): URL or pattern to crawl - `project_name` (string): Project identifier for isolation - `discovery_method` (string, optional): `auto`, `sitemap`, or `crawl` - `extraction_method` (string, optional): `auto`, `jina`, or `crawl4ai` **Example:** ```python await crawl_and_index( url_pattern="https://docs.supabase.com/docs/*", project_name="supabase-docs", discovery_method="auto", extraction_method="jina" ) ``` ### `list_projects` List all indexed projects. **Returns:** List of project names with document counts ### `search_documents` Search indexed documents using vector similarity. **Parameters:** - `query` (string): Search query - `project_name` (string, optional): Filter by project - `limit` (int, optional): Max results (default: 5) **Example:** ```python results = await search_documents( query="How do I set up authentication?", project_name="supabase-docs", limit=10 ) ``` ## Configuration See `.env.example` for all configuration options. ### Discovery Methods - `auto`: Try sitemap first, fallback to crawl - `sitemap`: Only use sitemap.xml (fast, fails if no sitemap) - `crawl`: Only use Crawl4AI recursive discovery (slow, comprehensive) ### Extraction Methods - `auto`: Use Jina for bulk extraction (>10 URLs), Crawl4AI otherwise - `jina`: Use Jina AI Reader API (fast, requires API key) - `crawl4ai`: Use Crawl4AI browser automation (slow, no API key needed) ## Development ```bash # Install dev dependencies uv pip install -e ".[dev]" # Run tests pytest # Format code black src/ # Lint ruff check src/ ``` ## Differences from mcp-crawl4ai-rag | Feature | mcp-crawl4ai-rag | mcp-jina-supabase-rag | |---------|------------------|------------------------| | **Focus** | Full-featured RAG with knowledge graphs | Lean documentation indexer | | **Discovery** | Recursive only | Sitemap first, crawl fallback | | **Extraction** | Crawl4AI only | Jina primary, Crawl4AI fallback | | **Dependencies** | Heavy (Neo4j, etc.) | Light (core only) | | **Use Case** | Advanced RAG with hallucination detection | Fast doc indexing | ## License MIT ## Contributing Contributions welcome! Please open an issue first to discuss changes.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/croakingtoad/mcp-jina-supabase-rag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server