Skip to main content
Glama
Marcussy34

Marcus Local MCP Server

by Marcussy34

πŸ“š Marcus Local MCP Server

A Model Context Protocol (MCP) server that indexes documentation sites and local code repositories for semantic search by AI assistants.

Next.js Python MCP ChromaDB

🎯 What Is This?

This is a local MCP server that enables AI assistants (Cursor, Claude Desktop, ChatGPT) to semantically search through:

  • Documentation websites - Crawled and indexed from any docs site

  • Local code repositories - All text files from your projects

It uses OpenAI embeddings to create a vector database (ChromaDB) that AI assistants can query through the Model Context Protocol.

Think of it as: Giving your AI assistant instant access to searchable documentation and your entire codebase.

πŸ“‹ How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  AI Assistant   β”‚ (Cursor, Claude, ChatGPT, etc.)
β”‚  (via MCP)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   MCP Server    β”‚ (Python - stdio)
β”‚   main.py       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   ChromaDB      │◄──────   OpenAI     β”‚
β”‚  (Vector Store) β”‚      β”‚  Embeddings  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚
         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Indexed Sources    β”‚
β”‚  β€’ Documentation     β”‚
β”‚    - Moca Network    β”‚
β”‚    - Your Docs       β”‚
β”‚  β€’ Repositories      β”‚
β”‚    - Your Codebase   β”‚
β”‚    - Local Projects  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The Flow:

  1. Index - Crawl docs OR read local repo files

  2. Chunk - Split content into 800-token chunks

  3. Embed - Create OpenAI embeddings (batched for speed)

  4. Store - Save in ChromaDB vector database

  5. Search - AI assistant queries via MCP protocol

  6. Retrieve - Return relevant chunks from docs/code

πŸš€ How to Run It

1. Setup

# Clone repository
git clone <your-repo>
cd crawl4ai_test

# Install Node.js dependencies
npm install

# Setup Python virtual environment
python3 -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install Python dependencies
pip install -r mcp-docs-server/requirements.txt

# Install Crawl4AI
pip install -U crawl4ai
crawl4ai-setup

2. Configure

Create .env file in mcp-docs-server/:

OPENAI_API_KEY=your_openai_api_key_here
EMBEDDING_MODEL=text-embedding-3-small
DEFAULT_RESULTS=5

3. Run the Web UI

# Start Next.js server
npm run dev

# Open browser
open http://localhost:3030

4. Connect to Cursor/Claude

Add to your AI assistant config:

For Cursor (~/.cursor/mcp.json or project config):

{
  "mcpServers": {
    "marcus-mcp-server": {
      "command": "/path/to/your/venv/bin/python3",
      "args": ["/path/to/crawl4ai_test/mcp-docs-server/server/main.py"]
    }
  }
}

For Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "marcus-docs": {
      "command": "/path/to/your/venv/bin/python",
      "args": ["/path/to/crawl4ai_test/mcp-docs-server/server/main.py"]
    }
  }
}

πŸ“– How to Use It

Adding Documentation

Via Web UI:

  1. Go to http://localhost:3030

  2. Click "Add New Docs"

  3. Enter:

    • URL: https://docs.example.com

    • Source Name: Example Docs

    • Max Pages: 50 (or unlimited)

  4. Click "Start Indexing"

  5. Wait for completion

Via Command Line:

cd mcp-docs-server
source ../venv/bin/activate

python scripts/crawler.py https://docs.example.com "Example Docs" 50
python scripts/indexer_multi.py "Example Docs"

Adding Repositories

Via Web UI:

  1. Go to http://localhost:3030

  2. Click "Add Repository"

  3. Enter:

    • Repository Path: Drag-and-drop folder OR paste path

    • Source Name: Auto-generated from folder name

  4. Click "Start Indexing"

  5. Watch live progress

What gets indexed:

  • βœ… All text files (.js, .py, .md, .tsx, .json, .css, etc.)

  • βœ… Auto-skips: node_modules, .git, venv, build, .next, etc.

  • βœ… Batched embeddings (50-100x faster)

Via Command Line:

cd mcp-docs-server
source ../venv/bin/activate

python scripts/repo_indexer.py "/path/to/your/repo" "My Project"

Searching

From Web UI:

  1. Enter query: "How do I initialize the SDK?"

  2. Select source (Docs, Repos, or All)

  3. Click "Search Documentation"

  4. View results

From AI Assistant:

Search all sources:

@marcus-mcp-server search for "authentication flow"

Filter by specific source:

@marcus-mcp-server search for "BorrowInterface component" 
with source="Credo Protocol"

Example usage in Cursor:

User: Using my marcus-mcp-server, show me how authentication 
      is implemented in the Credo Protocol repository

AI: [Searches indexed repository and returns relevant code chunks]

Pro Tip: Always filter by source name to get focused results and save context tokens.

Managing Sources

View Sources:

  • See all indexed docs and repos on the main page

  • Filter by "Docs" or "Repos" tabs

  • Expand to see individual pages/files

Delete Sources:

  • Click trash icon next to any source

  • Confirm deletion

  • Source and all chunks are removed

πŸ“ Project Structure

crawl4ai_test/
β”œβ”€β”€ pages/                        # Next.js UI
β”‚   β”œβ”€β”€ index.js                 # Main page (search + sources)
β”‚   β”œβ”€β”€ add.js                   # Add documentation
β”‚   β”œβ”€β”€ add-repo.js              # Add repository
β”‚   └── api/                     # API routes
β”‚       β”œβ”€β”€ mcp-search.js        # Search endpoint
β”‚       β”œβ”€β”€ mcp-info.js          # Get index info
β”‚       β”œβ”€β”€ add-docs-crawl.js    # Crawl docs
β”‚       β”œβ”€β”€ add-docs-index.js    # Index docs
β”‚       β”œβ”€β”€ add-repo-index.js    # Index repository
β”‚       └── mcp-delete-source.js # Delete source
β”œβ”€β”€ components/
β”‚   β”œβ”€β”€ ui/                      # shadcn/ui components
β”‚   └── home/                    # Page components
β”œβ”€β”€ mcp-docs-server/             # MCP Server
β”‚   β”œβ”€β”€ server/
β”‚   β”‚   └── main.py             # MCP server (stdio)
β”‚   β”œβ”€β”€ scripts/
β”‚   β”‚   β”œβ”€β”€ crawler.py          # Crawl docs with Crawl4AI
β”‚   β”‚   β”œβ”€β”€ indexer_multi.py    # Index docs
β”‚   β”‚   β”œβ”€β”€ repo_indexer.py     # Index repositories
β”‚   β”‚   β”œβ”€β”€ get_source_pages.py # Get pages/files
β”‚   β”‚   β”œβ”€β”€ search.py           # Search
β”‚   β”‚   └── delete_source.py    # Delete sources
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   β”œβ”€β”€ chroma_db/          # Vector database
β”‚   β”‚   β”œβ”€β”€ chunks/             # Metadata
β”‚   β”‚   └── raw/                # Crawled JSON
β”‚   └── requirements.txt
└── venv/                        # Python environment

🎨 Built With

  • Frontend: Next.js 15 + shadcn/ui + Tailwind CSS

  • Backend: Python 3.13 + MCP Protocol

  • Crawler: Crawl4AI

  • Vector DB: ChromaDB

  • Embeddings: OpenAI (text-embedding-3-small)


Status: βœ… Fully Operational | πŸ€– MCP Ready | πŸ” Search Enabled

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Marcussy34/localMCP-crawl4ai-RAG'

If you have feedback or need assistance with the MCP directory API, please join our Discord server