Generates text embeddings using OpenAI's text-embedding-3-small model for vector-based document search and retrieval.
Crawls and indexes documentation websites into Supabase using pgvector for vector similarity search, enabling RAG (Retrieval-Augmented Generation) with multi-project support and intelligent text chunking.
MCP Jina Supabase RAG
A lean, focused MCP server for crawling documentation websites and indexing them to Supabase for RAG (Retrieval-Augmented Generation).
Features
Smart URL Discovery: Tries sitemap.xml first, falls back to Crawl4AI recursive discovery
Hybrid Content Extraction: Uses Jina AI for fast content extraction, Crawl4AI as fallback
Multi-Project Support: Index multiple documentation sites to separate Supabase projects
Efficient Chunking: Intelligent text chunking with configurable size and overlap
Vector Embeddings: OpenAI embeddings stored in Supabase pgvector
Architecture
Installation
Prerequisites
Python 3.12+
Jina AI API key (optional, recommended)
Setup
Clone the repository:
Install dependencies:
Set up Supabase database:
Configure environment:
Usage
Running the MCP Server
Configure MCP Client
Claude Code
Cursor / Claude Desktop
Slash Command
Create /home/marty/.claude/commands/jina.md:
Tools
crawl_and_index
Crawl a documentation site and index to Supabase.
Parameters:
url_pattern(string): URL or pattern to crawlproject_name(string): Project identifier for isolationdiscovery_method(string, optional):auto,sitemap, orcrawlextraction_method(string, optional):auto,jina, orcrawl4ai
Example:
list_projects
List all indexed projects.
Returns: List of project names with document counts
search_documents
Search indexed documents using vector similarity.
Parameters:
query(string): Search queryproject_name(string, optional): Filter by projectlimit(int, optional): Max results (default: 5)
Example:
Configuration
See .env.example for all configuration options.
Discovery Methods
auto: Try sitemap first, fallback to crawlsitemap: Only use sitemap.xml (fast, fails if no sitemap)crawl: Only use Crawl4AI recursive discovery (slow, comprehensive)
Extraction Methods
auto: Use Jina for bulk extraction (>10 URLs), Crawl4AI otherwisejina: Use Jina AI Reader API (fast, requires API key)crawl4ai: Use Crawl4AI browser automation (slow, no API key needed)
Development
Differences from mcp-crawl4ai-rag
Feature | mcp-crawl4ai-rag | mcp-jina-supabase-rag |
Focus | Full-featured RAG with knowledge graphs | Lean documentation indexer |
Discovery | Recursive only | Sitemap first, crawl fallback |
Extraction | Crawl4AI only | Jina primary, Crawl4AI fallback |
Dependencies | Heavy (Neo4j, etc.) | Light (core only) |
Use Case | Advanced RAG with hallucination detection | Fast doc indexing |
License
MIT
Contributing
Contributions welcome! Please open an issue first to discuss changes.
This server cannot be installed