š Marcus Local MCP Server
A Model Context Protocol (MCP) server that indexes documentation sites and local code repositories for semantic search by AI assistants.

šÆ What Is This?
This is a local MCP server that enables AI assistants (Cursor, Claude Desktop, ChatGPT) to semantically search through:
It uses OpenAI embeddings to create a vector database (ChromaDB) that AI assistants can query through the Model Context Protocol.
Think of it as: Giving your AI assistant instant access to searchable documentation and your entire codebase.
š How It Works
āāāāāāāāāāāāāāāāāāā
ā AI Assistant ā (Cursor, Claude, ChatGPT, etc.)
ā (via MCP) ā
āāāāāāāāāā¬āāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāā
ā MCP Server ā (Python - stdio)
ā main.py ā
āāāāāāāāāā¬āāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā
ā ChromaDB āāāāāāā⤠OpenAI ā
ā (Vector Store) ā ā Embeddings ā
āāāāāāāāāā¬āāāāāāāāā āāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāā
ā Indexed Sources ā
ā ⢠Documentation ā
ā - Moca Network ā
ā - Your Docs ā
ā ⢠Repositories ā
ā - Your Codebase ā
ā - Local Projects ā
āāāāāāāāāāāāāāāāāāāāāāāā
The Flow:
Index - Crawl docs OR read local repo files
Chunk - Split content into 800-token chunks
Embed - Create OpenAI embeddings (batched for speed)
Store - Save in ChromaDB vector database
Search - AI assistant queries via MCP protocol
Retrieve - Return relevant chunks from docs/code
š How to Run It
1. Setup
# Clone repository
git clone <your-repo>
cd crawl4ai_test
# Install Node.js dependencies
npm install
# Setup Python virtual environment
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install Python dependencies
pip install -r mcp-docs-server/requirements.txt
# Install Crawl4AI
pip install -U crawl4ai
crawl4ai-setup
2. Configure
Create .env file in mcp-docs-server/:
OPENAI_API_KEY=your_openai_api_key_here
EMBEDDING_MODEL=text-embedding-3-small
DEFAULT_RESULTS=5
3. Run the Web UI
# Start Next.js server
npm run dev
# Open browser
open http://localhost:3030
4. Connect to Cursor/Claude
Add to your AI assistant config:
For Cursor (~/.cursor/mcp.json or project config):
{
"mcpServers": {
"marcus-mcp-server": {
"command": "/path/to/your/venv/bin/python3",
"args": ["/path/to/crawl4ai_test/mcp-docs-server/server/main.py"]
}
}
}
For Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"marcus-docs": {
"command": "/path/to/your/venv/bin/python",
"args": ["/path/to/crawl4ai_test/mcp-docs-server/server/main.py"]
}
}
}
š How to Use It
Adding Documentation
Via Web UI:
Go to http://localhost:3030
Click "Add New Docs"
Enter:
URL: https://docs.example.com
Source Name: Example Docs
Max Pages: 50 (or unlimited)
Click "Start Indexing"
Wait for completion
Via Command Line:
cd mcp-docs-server
source ../venv/bin/activate
python scripts/crawler.py https://docs.example.com "Example Docs" 50
python scripts/indexer_multi.py "Example Docs"
Adding Repositories
Via Web UI:
Go to http://localhost:3030
Click "Add Repository"
Enter:
Click "Start Indexing"
Watch live progress
What gets indexed:
ā
All text files (.js, .py, .md, .tsx, .json, .css, etc.)
ā
Auto-skips: node_modules, .git, venv, build, .next, etc.
ā
Batched embeddings (50-100x faster)
Via Command Line:
cd mcp-docs-server
source ../venv/bin/activate
python scripts/repo_indexer.py "/path/to/your/repo" "My Project"
Searching
From Web UI:
Enter query: "How do I initialize the SDK?"
Select source (Docs, Repos, or All)
Click "Search Documentation"
View results
From AI Assistant:
Search all sources:
@marcus-mcp-server search for "authentication flow"
Filter by specific source:
@marcus-mcp-server search for "BorrowInterface component"
with source="Credo Protocol"
Example usage in Cursor:
User: Using my marcus-mcp-server, show me how authentication
is implemented in the Credo Protocol repository
AI: [Searches indexed repository and returns relevant code chunks]
Pro Tip: Always filter by source name to get focused results and save context tokens.
Managing Sources
View Sources:
See all indexed docs and repos on the main page
Filter by "Docs" or "Repos" tabs
Expand to see individual pages/files
Delete Sources:
š Project Structure
crawl4ai_test/
āāā pages/ # Next.js UI
ā āāā index.js # Main page (search + sources)
ā āāā add.js # Add documentation
ā āāā add-repo.js # Add repository
ā āāā api/ # API routes
ā āāā mcp-search.js # Search endpoint
ā āāā mcp-info.js # Get index info
ā āāā add-docs-crawl.js # Crawl docs
ā āāā add-docs-index.js # Index docs
ā āāā add-repo-index.js # Index repository
ā āāā mcp-delete-source.js # Delete source
āāā components/
ā āāā ui/ # shadcn/ui components
ā āāā home/ # Page components
āāā mcp-docs-server/ # MCP Server
ā āāā server/
ā ā āāā main.py # MCP server (stdio)
ā āāā scripts/
ā ā āāā crawler.py # Crawl docs with Crawl4AI
ā ā āāā indexer_multi.py # Index docs
ā ā āāā repo_indexer.py # Index repositories
ā ā āāā get_source_pages.py # Get pages/files
ā ā āāā search.py # Search
ā ā āāā delete_source.py # Delete sources
ā āāā data/
ā ā āāā chroma_db/ # Vector database
ā ā āāā chunks/ # Metadata
ā ā āāā raw/ # Crawled JSON
ā āāā requirements.txt
āāā venv/ # Python environment
šØ Built With
Frontend: Next.js 15 + shadcn/ui + Tailwind CSS
Backend: Python 3.13 + MCP Protocol
Crawler: Crawl4AI
Vector DB: ChromaDB
Embeddings: OpenAI (text-embedding-3-small)
Status: ā
Fully Operational | š¤ MCP Ready | š Search Enabled