Skip to main content
Glama
Marcussy34

Marcus Local MCP Server

by Marcussy34

šŸ“š Marcus Local MCP Server

A Model Context Protocol (MCP) server that indexes documentation sites and local code repositories for semantic search by AI assistants.

Next.js Python MCP ChromaDB

šŸŽÆ What Is This?

This is a local MCP server that enables AI assistants (Cursor, Claude Desktop, ChatGPT) to semantically search through:

  • Documentation websites - Crawled and indexed from any docs site

  • Local code repositories - All text files from your projects

It uses OpenAI embeddings to create a vector database (ChromaDB) that AI assistants can query through the Model Context Protocol.

Think of it as: Giving your AI assistant instant access to searchable documentation and your entire codebase.

šŸ“‹ How It Works

ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ AI Assistant │ (Cursor, Claude, ChatGPT, etc.) │ (via MCP) │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ │ ā–¼ ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ MCP Server │ (Python - stdio) │ main.py │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ │ ā–¼ ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ ChromaDB │◄─────┤ OpenAI │ │ (Vector Store) │ │ Embeddings │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”¬ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜ │ ā–¼ ā”Œā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā” │ Indexed Sources │ │ • Documentation │ │ - Moca Network │ │ - Your Docs │ │ • Repositories │ │ - Your Codebase │ │ - Local Projects │ ā””ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”€ā”˜

The Flow:

  1. Index - Crawl docs OR read local repo files

  2. Chunk - Split content into 800-token chunks

  3. Embed - Create OpenAI embeddings (batched for speed)

  4. Store - Save in ChromaDB vector database

  5. Search - AI assistant queries via MCP protocol

  6. Retrieve - Return relevant chunks from docs/code

šŸš€ How to Run It

1. Setup

# Clone repository git clone <your-repo> cd crawl4ai_test # Install Node.js dependencies npm install # Setup Python virtual environment python3 -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate # Install Python dependencies pip install -r mcp-docs-server/requirements.txt # Install Crawl4AI pip install -U crawl4ai crawl4ai-setup

2. Configure

Create .env file in mcp-docs-server/:

OPENAI_API_KEY=your_openai_api_key_here EMBEDDING_MODEL=text-embedding-3-small DEFAULT_RESULTS=5

3. Run the Web UI

# Start Next.js server npm run dev # Open browser open http://localhost:3030

4. Connect to Cursor/Claude

Add to your AI assistant config:

For Cursor (~/.cursor/mcp.json or project config):

{ "mcpServers": { "marcus-mcp-server": { "command": "/path/to/your/venv/bin/python3", "args": ["/path/to/crawl4ai_test/mcp-docs-server/server/main.py"] } } }

For Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{ "mcpServers": { "marcus-docs": { "command": "/path/to/your/venv/bin/python", "args": ["/path/to/crawl4ai_test/mcp-docs-server/server/main.py"] } } }

šŸ“– How to Use It

Adding Documentation

Via Web UI:

  1. Go to http://localhost:3030

  2. Click "Add New Docs"

  3. Enter:

    • URL: https://docs.example.com

    • Source Name: Example Docs

    • Max Pages: 50 (or unlimited)

  4. Click "Start Indexing"

  5. Wait for completion

Via Command Line:

cd mcp-docs-server source ../venv/bin/activate python scripts/crawler.py https://docs.example.com "Example Docs" 50 python scripts/indexer_multi.py "Example Docs"

Adding Repositories

Via Web UI:

  1. Go to http://localhost:3030

  2. Click "Add Repository"

  3. Enter:

    • Repository Path: Drag-and-drop folder OR paste path

    • Source Name: Auto-generated from folder name

  4. Click "Start Indexing"

  5. Watch live progress

What gets indexed:

  • āœ… All text files (.js, .py, .md, .tsx, .json, .css, etc.)

  • āœ… Auto-skips: node_modules, .git, venv, build, .next, etc.

  • āœ… Batched embeddings (50-100x faster)

Via Command Line:

cd mcp-docs-server source ../venv/bin/activate python scripts/repo_indexer.py "/path/to/your/repo" "My Project"

Searching

From Web UI:

  1. Enter query: "How do I initialize the SDK?"

  2. Select source (Docs, Repos, or All)

  3. Click "Search Documentation"

  4. View results

From AI Assistant:

Search all sources:

@marcus-mcp-server search for "authentication flow"

Filter by specific source:

@marcus-mcp-server search for "BorrowInterface component" with source="Credo Protocol"

Example usage in Cursor:

User: Using my marcus-mcp-server, show me how authentication is implemented in the Credo Protocol repository AI: [Searches indexed repository and returns relevant code chunks]

Pro Tip: Always filter by source name to get focused results and save context tokens.

Managing Sources

View Sources:

  • See all indexed docs and repos on the main page

  • Filter by "Docs" or "Repos" tabs

  • Expand to see individual pages/files

Delete Sources:

  • Click trash icon next to any source

  • Confirm deletion

  • Source and all chunks are removed

šŸ“ Project Structure

crawl4ai_test/ ā”œā”€ā”€ pages/ # Next.js UI │ ā”œā”€ā”€ index.js # Main page (search + sources) │ ā”œā”€ā”€ add.js # Add documentation │ ā”œā”€ā”€ add-repo.js # Add repository │ └── api/ # API routes │ ā”œā”€ā”€ mcp-search.js # Search endpoint │ ā”œā”€ā”€ mcp-info.js # Get index info │ ā”œā”€ā”€ add-docs-crawl.js # Crawl docs │ ā”œā”€ā”€ add-docs-index.js # Index docs │ ā”œā”€ā”€ add-repo-index.js # Index repository │ └── mcp-delete-source.js # Delete source ā”œā”€ā”€ components/ │ ā”œā”€ā”€ ui/ # shadcn/ui components │ └── home/ # Page components ā”œā”€ā”€ mcp-docs-server/ # MCP Server │ ā”œā”€ā”€ server/ │ │ └── main.py # MCP server (stdio) │ ā”œā”€ā”€ scripts/ │ │ ā”œā”€ā”€ crawler.py # Crawl docs with Crawl4AI │ │ ā”œā”€ā”€ indexer_multi.py # Index docs │ │ ā”œā”€ā”€ repo_indexer.py # Index repositories │ │ ā”œā”€ā”€ get_source_pages.py # Get pages/files │ │ ā”œā”€ā”€ search.py # Search │ │ └── delete_source.py # Delete sources │ ā”œā”€ā”€ data/ │ │ ā”œā”€ā”€ chroma_db/ # Vector database │ │ ā”œā”€ā”€ chunks/ # Metadata │ │ └── raw/ # Crawled JSON │ └── requirements.txt └── venv/ # Python environment

šŸŽØ Built With

  • Frontend: Next.js 15 + shadcn/ui + Tailwind CSS

  • Backend: Python 3.13 + MCP Protocol

  • Crawler: Crawl4AI

  • Vector DB: ChromaDB

  • Embeddings: OpenAI (text-embedding-3-small)


Status: āœ… Fully Operational | šŸ¤– MCP Ready | šŸ” Search Enabled

-
security - not tested
F
license - not found
-
quality - not tested

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Marcussy34/localMCP-crawl4ai-RAG'

If you have feedback or need assistance with the MCP directory API, please join our Discord server