Generates text embeddings using OpenAI's text-embedding-3-small model for vector-based document search and retrieval.
Crawls and indexes documentation websites into Supabase using pgvector for vector similarity search, enabling RAG (Retrieval-Augmented Generation) with multi-project support and intelligent text chunking.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP Jina Supabase RAGcrawl https://docs.example.com/api/* and index it as 'api-docs'"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MCP Jina Supabase RAG
A lean, focused MCP server for crawling documentation websites and indexing them to Supabase for RAG (Retrieval-Augmented Generation).
Features
Smart URL Discovery: Tries sitemap.xml first, falls back to Crawl4AI recursive discovery
Hybrid Content Extraction: Uses Jina AI for fast content extraction, Crawl4AI as fallback
Multi-Project Support: Index multiple documentation sites to separate Supabase projects
Efficient Chunking: Intelligent text chunking with configurable size and overlap
Vector Embeddings: OpenAI embeddings stored in Supabase pgvector
Architecture
┌─────────────────────────────────────────────────────────────┐
│ MCP Server Tools │
├─────────────────────────────────────────────────────────────┤
│ 1. crawl_and_index(url_pattern, project_name) │
│ 2. list_projects() │
│ 3. search_documents(query, project_name, limit) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Discovery Layer │
├─────────────────────────────────────────────────────────────┤
│ • Try sitemap.xml (fast) │
│ • Try common doc patterns │
│ • Crawl4AI recursive discovery (fallback) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Extraction Layer │
├─────────────────────────────────────────────────────────────┤
│ • Jina AI Reader API (primary, fast) │
│ • Crawl4AI (fallback for complex pages) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Chunking & Embedding Layer │
├─────────────────────────────────────────────────────────────┤
│ • Smart text chunking │
│ • OpenAI embeddings (text-embedding-3-small) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Supabase Storage │
├─────────────────────────────────────────────────────────────┤
│ • pgvector for similarity search │
│ • Project isolation via source column │
└─────────────────────────────────────────────────────────────┘Installation
Prerequisites
Python 3.12+
Jina AI API key (optional, recommended)
Setup
Clone the repository:
git clone https://github.com/yourusername/mcp-jina-supabase-rag.git
cd mcp-jina-supabase-ragInstall dependencies:
# Using uv (recommended)
uv venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
uv pip install -e .
# Or using pip
pip install -e .Set up Supabase database:
# Run the SQL in supabase_schema.sql in your Supabase SQL EditorConfigure environment:
cp .env.example .env
# Edit .env with your credentialsUsage
Running the MCP Server
# SSE transport (recommended for remote connections)
python src/main.py
# The server will start on http://localhost:8052/sseConfigure MCP Client
Claude Code
claude mcp add --transport sse jina-supabase http://localhost:8052/sseCursor / Claude Desktop
{
"mcpServers": {
"jina-supabase": {
"transport": "sse",
"url": "http://localhost:8052/sse"
}
}
}Slash Command
Create /home/marty/.claude/commands/jina.md:
---
allowed-tools: mcp__jina-supabase
argument-hint: <url_pattern> <project_name>
description: Crawl documentation and index to Supabase RAG
---
# Index Documentation to Supabase
Use the jina-supabase MCP server to crawl and index documentation.
Arguments:
- $1: URL pattern (e.g., https://docs.example.com/*)
- $2: Project name for isolation
Example:
/jina https://docs.anthropic.com/claude/* anthropic-docsTools
crawl_and_index
Crawl a documentation site and index to Supabase.
Parameters:
url_pattern(string): URL or pattern to crawlproject_name(string): Project identifier for isolationdiscovery_method(string, optional):auto,sitemap, orcrawlextraction_method(string, optional):auto,jina, orcrawl4ai
Example:
await crawl_and_index(
url_pattern="https://docs.supabase.com/docs/*",
project_name="supabase-docs",
discovery_method="auto",
extraction_method="jina"
)list_projects
List all indexed projects.
Returns: List of project names with document counts
search_documents
Search indexed documents using vector similarity.
Parameters:
query(string): Search queryproject_name(string, optional): Filter by projectlimit(int, optional): Max results (default: 5)
Example:
results = await search_documents(
query="How do I set up authentication?",
project_name="supabase-docs",
limit=10
)Configuration
See .env.example for all configuration options.
Discovery Methods
auto: Try sitemap first, fallback to crawlsitemap: Only use sitemap.xml (fast, fails if no sitemap)crawl: Only use Crawl4AI recursive discovery (slow, comprehensive)
Extraction Methods
auto: Use Jina for bulk extraction (>10 URLs), Crawl4AI otherwisejina: Use Jina AI Reader API (fast, requires API key)crawl4ai: Use Crawl4AI browser automation (slow, no API key needed)
Development
# Install dev dependencies
uv pip install -e ".[dev]"
# Run tests
pytest
# Format code
black src/
# Lint
ruff check src/Differences from mcp-crawl4ai-rag
Feature | mcp-crawl4ai-rag | mcp-jina-supabase-rag |
Focus | Full-featured RAG with knowledge graphs | Lean documentation indexer |
Discovery | Recursive only | Sitemap first, crawl fallback |
Extraction | Crawl4AI only | Jina primary, Crawl4AI fallback |
Dependencies | Heavy (Neo4j, etc.) | Light (core only) |
Use Case | Advanced RAG with hallucination detection | Fast doc indexing |
License
MIT
Contributing
Contributions welcome! Please open an issue first to discuss changes.
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.