# š Marcus Local MCP Server
A Model Context Protocol (MCP) server that indexes documentation sites and local code repositories for semantic search by AI assistants.




## šÆ What Is This?
This is a **local MCP server** that enables AI assistants (Cursor, Claude Desktop, ChatGPT) to semantically search through:
- **Documentation websites** - Crawled and indexed from any docs site
- **Local code repositories** - All text files from your projects
It uses OpenAI embeddings to create a vector database (ChromaDB) that AI assistants can query through the Model Context Protocol.
**Think of it as:** Giving your AI assistant instant access to searchable documentation and your entire codebase.
## š How It Works
```
āāāāāāāāāāāāāāāāāāā
ā AI Assistant ā (Cursor, Claude, ChatGPT, etc.)
ā (via MCP) ā
āāāāāāāāāā¬āāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāā
ā MCP Server ā (Python - stdio)
ā main.py ā
āāāāāāāāāā¬āāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāāā
ā ChromaDB āāāāāāā⤠OpenAI ā
ā (Vector Store) ā ā Embeddings ā
āāāāāāāāāā¬āāāāāāāāā āāāāāāāāāāāāāāāā
ā
ā¼
āāāāāāāāāāāāāāāāāāāāāāāā
ā Indexed Sources ā
ā ⢠Documentation ā
ā - Moca Network ā
ā - Your Docs ā
ā ⢠Repositories ā
ā - Your Codebase ā
ā - Local Projects ā
āāāāāāāāāāāāāāāāāāāāāāāā
```
**The Flow:**
1. **Index** - Crawl docs OR read local repo files
2. **Chunk** - Split content into 800-token chunks
3. **Embed** - Create OpenAI embeddings (batched for speed)
4. **Store** - Save in ChromaDB vector database
5. **Search** - AI assistant queries via MCP protocol
6. **Retrieve** - Return relevant chunks from docs/code
## š How to Run It
### 1. Setup
```bash
# Clone repository
git clone <your-repo>
cd crawl4ai_test
# Install Node.js dependencies
npm install
# Setup Python virtual environment
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install Python dependencies
pip install -r mcp-docs-server/requirements.txt
# Install Crawl4AI
pip install -U crawl4ai
crawl4ai-setup
```
### 2. Configure
Create `.env` file in `mcp-docs-server/`:
```bash
OPENAI_API_KEY=your_openai_api_key_here
EMBEDDING_MODEL=text-embedding-3-small
DEFAULT_RESULTS=5
```
### 3. Run the Web UI
```bash
# Start Next.js server
npm run dev
# Open browser
open http://localhost:3030
```
### 4. Connect to Cursor/Claude
Add to your AI assistant config:
**For Cursor** (`~/.cursor/mcp.json` or project config):
```json
{
"mcpServers": {
"marcus-mcp-server": {
"command": "/path/to/your/venv/bin/python3",
"args": ["/path/to/crawl4ai_test/mcp-docs-server/server/main.py"]
}
}
}
```
**For Claude Desktop** (`~/Library/Application Support/Claude/claude_desktop_config.json`):
```json
{
"mcpServers": {
"marcus-docs": {
"command": "/path/to/your/venv/bin/python",
"args": ["/path/to/crawl4ai_test/mcp-docs-server/server/main.py"]
}
}
}
```
## š How to Use It
### Adding Documentation
**Via Web UI:**
1. Go to http://localhost:3030
2. Click **"Add New Docs"**
3. Enter:
- **URL**: `https://docs.example.com`
- **Source Name**: `Example Docs`
- **Max Pages**: `50` (or unlimited)
4. Click **"Start Indexing"**
5. Wait for completion
**Via Command Line:**
```bash
cd mcp-docs-server
source ../venv/bin/activate
python scripts/crawler.py https://docs.example.com "Example Docs" 50
python scripts/indexer_multi.py "Example Docs"
```
### Adding Repositories
**Via Web UI:**
1. Go to http://localhost:3030
2. Click **"Add Repository"**
3. Enter:
- **Repository Path**: Drag-and-drop folder OR paste path
- **Source Name**: Auto-generated from folder name
4. Click **"Start Indexing"**
5. Watch live progress
**What gets indexed:**
- ā
All text files (`.js`, `.py`, `.md`, `.tsx`, `.json`, `.css`, etc.)
- ā
Auto-skips: `node_modules`, `.git`, `venv`, `build`, `.next`, etc.
- ā
Batched embeddings (50-100x faster)
**Via Command Line:**
```bash
cd mcp-docs-server
source ../venv/bin/activate
python scripts/repo_indexer.py "/path/to/your/repo" "My Project"
```
### Searching
**From Web UI:**
1. Enter query: `"How do I initialize the SDK?"`
2. Select source (Docs, Repos, or All)
3. Click **"Search Documentation"**
4. View results
**From AI Assistant:**
Search all sources:
```
@marcus-mcp-server search for "authentication flow"
```
Filter by specific source:
```
@marcus-mcp-server search for "BorrowInterface component"
with source="Credo Protocol"
```
Example usage in Cursor:
```
User: Using my marcus-mcp-server, show me how authentication
is implemented in the Credo Protocol repository
AI: [Searches indexed repository and returns relevant code chunks]
```
**Pro Tip:** Always filter by source name to get focused results and save context tokens.
### Managing Sources
**View Sources:**
- See all indexed docs and repos on the main page
- Filter by "Docs" or "Repos" tabs
- Expand to see individual pages/files
**Delete Sources:**
- Click trash icon next to any source
- Confirm deletion
- Source and all chunks are removed
## š Project Structure
```
crawl4ai_test/
āāā pages/ # Next.js UI
ā āāā index.js # Main page (search + sources)
ā āāā add.js # Add documentation
ā āāā add-repo.js # Add repository
ā āāā api/ # API routes
ā āāā mcp-search.js # Search endpoint
ā āāā mcp-info.js # Get index info
ā āāā add-docs-crawl.js # Crawl docs
ā āāā add-docs-index.js # Index docs
ā āāā add-repo-index.js # Index repository
ā āāā mcp-delete-source.js # Delete source
āāā components/
ā āāā ui/ # shadcn/ui components
ā āāā home/ # Page components
āāā mcp-docs-server/ # MCP Server
ā āāā server/
ā ā āāā main.py # MCP server (stdio)
ā āāā scripts/
ā ā āāā crawler.py # Crawl docs with Crawl4AI
ā ā āāā indexer_multi.py # Index docs
ā ā āāā repo_indexer.py # Index repositories
ā ā āāā get_source_pages.py # Get pages/files
ā ā āāā search.py # Search
ā ā āāā delete_source.py # Delete sources
ā āāā data/
ā ā āāā chroma_db/ # Vector database
ā ā āāā chunks/ # Metadata
ā ā āāā raw/ # Crawled JSON
ā āāā requirements.txt
āāā venv/ # Python environment
```
## šØ Built With
- **Frontend**: Next.js 15 + shadcn/ui + Tailwind CSS
- **Backend**: Python 3.13 + MCP Protocol
- **Crawler**: Crawl4AI
- **Vector DB**: ChromaDB
- **Embeddings**: OpenAI (text-embedding-3-small)
---
**Status**: ā
Fully Operational | š¤ MCP Ready | š Search Enabled