How do I use MCP Vector Proxy?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@MCP Vector Proxy find a tool that can help me manage my GitHub issues" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

MCP Vector Proxy

by alexiokay

Overview Schema Related Servers Score Discussions

TypeScript

Local

MCP Vector Proxy

A semantic MCP proxy that sits between AI agents and MCP Router, exposing only 4 tools instead of hundreds. Uses local vector embeddings to find the right tool on demand — no OpenAI key required.

Why

When you have 150+ MCP tools, passing all of them to an AI agent costs ~30,000 tokens per request. This proxy exposes just 4 tools (discover_tools, execute_tool, batch_execute, refresh_tools). The agent searches semantically for what it needs, then calls it — reducing token usage by ~93%.

Without proxy:  151 tools × ~200 tokens = 30,860 tokens per request
With proxy:     4 tool definitions + search results = ~500 tokens

Architecture

MCP Router (all your servers)
        │ stdio
        ▼
mcp-vector-proxy  (tray-managed background process, port 3456)
  - Local embeddings: EmbeddingGemma-300M q8 (~150MB, runs offline)
  - LanceDB vector store (persistent, handles 1M+ tools, no server)
  - Hybrid search: dense vector + BM25 keyword + RRF fusion
  - Auto-syncs when tools change (MCP notifications + polling)
  - HTTP: Streamable HTTP + SSE legacy
        │
        ├── Claude Code / other agents  (HTTP → :3456/mcp)
        │
        └── Claude Desktop              (stdio-bridge → HTTP)

System tray  (node dist/tray.js, auto-starts on login)
  - Green  = connected, N tools indexed
  - Yellow = MCP Router reconnecting
  - Red    = proxy down / crashed (auto-restarts)
  - Right-click → Restart Proxy / Open Health URL / Exit

Requirements

All platforms: Node.js 18+, MCP Router installed and running
Windows: Windows 10/11
macOS: macOS 10.15+
Linux: Any desktop with a system tray (GNOME, KDE, etc.)

First run: EmbeddingGemma-300M (~150MB) downloads automatically on first startup and is cached to .model-cache/. Subsequent starts are instant.

Setup

1. Configure your token

cp .env.example .env
# Edit .env and replace "your-mcp-router-token-here" with your real MCPR_TOKEN

The .env file is gitignored. Alternatively, set MCPR_TOKEN as a system environment variable — it takes precedence over the .env file.

2. Install dependencies and build

npm install
npm run build

3. Register auto-start and launch the tray

Windows:

npm run setup
# or: powershell -ExecutionPolicy Bypass -File setup.ps1

macOS / Linux:

npm run setup
# or: bash setup.sh

This registers the tray to start on every login and launches it immediately.

4. Connect your AI clients

Claude Code (~/.claude.json):

{
  "mcpServers": {
    "mcp-vector-proxy": {
      "type": "http",
      "url": "http://127.0.0.1:3456/mcp"
    }
  }
}

Claude Desktop (%APPDATA%\Claude\claude_desktop_config.json on Windows, ~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

{
  "mcpServers": {
    "mcp-vector-proxy": {
      "command": "node",
      "args": ["/absolute/path/to/mcp-proxy/dist/stdio-bridge.js"],
      "env": { "PROXY_URL": "http://127.0.0.1:3456/mcp" }
    }
  }
}

Any other agent — point it at http://127.0.0.1:3456/mcp (Streamable HTTP) or http://127.0.0.1:3456/sse (SSE legacy).

Environment Variables

Variable	Default	Description
`MCPR_TOKEN`	(from .env)	MCP Router auth token — required
`HTTP_PORT`	(none = stdio mode)	Port for HTTP server
`HTTP_HOST`	`127.0.0.1`	Bind address
`POLL_INTERVAL_MS`	`15000`	Tool change polling interval
`DISCOVER_LIMIT`	`10`	Default max results from `discover_tools`

Tools Exposed to Agents

Tool	Description
`discover_tools`	Hybrid semantic + keyword search — find relevant tools by natural language query
`execute_tool`	Execute any MCP tool by exact name with arguments
`batch_execute`	Execute multiple MCP tools in parallel in a single call
`refresh_tools`	Force re-index all tools from MCP Router immediately

npm Scripts

npm run build          # Compile TypeScript → dist/
npm run setup          # Register auto-start + launch tray (platform-detected)
npm run update         # Build + restart tray (platform-detected)

Updating

After changing source code:

npm run update

This rebuilds everything and restarts the tray (which restarts the proxy).

Health Check

GET http://127.0.0.1:3456/health

{
  "status": "ok",
  "routerConnected": true,
  "tools": 151,
  "indexedAt": "2026-02-17T15:51:21.620Z",
  "sessions": { "streamable": 1, "sse": 0 }
}

Status is "ok" when MCP Router is connected and tools are indexed. "disconnected" means the proxy is up but MCP Router is unreachable (it will auto-reconnect).

File Reference

src/
  index.ts          — Main proxy server (HTTP + stdio modes, hybrid vector search)
  stdio-bridge.ts   — Thin stdio→HTTP forwarder for Claude Desktop
  launch-router.ts  — Spawns MCP Router CLI with windowsHide:true
  tray.ts           — Cross-platform system tray (systray2)

dist/               — Compiled output (generated by npm run build)

.env.example        — Template for .env (copy and fill in MCPR_TOKEN)
.env                — Your config (gitignored, never commit this)

setup.ps1           — Windows: register auto-start + launch tray
setup.sh            — macOS/Linux: register auto-start + launch tray
restart-tray.ps1    — Windows: kill + restart tray
restart-tray.sh     — macOS/Linux: kill + restart tray

.lancedb/           — LanceDB vector store (auto-generated, gitignored)
.tool-meta.json     — Tool fingerprint cache (auto-generated, gitignored)
.model-cache/       — Downloaded embedding model (~150MB, gitignored)

How Tool Sync Works

On startup, tools from MCP Router are embedded using EmbeddingGemma-300M and stored in LanceDB
MCP Router sends a tools/list_changed notification when servers change → immediate re-index
A polling fallback runs every 15s to catch any missed notifications
Re-indexing is incremental — only new or changed tools get re-embedded, cached embeddings are reused
Tool schema changes (new parameters) are detected via fingerprint and trigger re-indexing

How Search Works

discover_tools uses hybrid search for best accuracy:

Dense vector search — LanceDB finds semantically similar tools using EmbeddingGemma-300M embeddings (handles paraphrasing, synonyms, conceptual matches)
BM25 keyword search — in-memory scoring finds exact tool name / keyword matches that semantic search can miss
RRF fusion — Reciprocal Rank Fusion merges both ranked lists into a single optimal ranking

This combination handles both vague queries ("something to do with files") and precise queries ("browser_screenshot") accurately at any scale.

This server cannot be installed

license - not found

quality - not tested

maintenance

How are these scores calculated?

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/alexiokay/Simple-MCP-Proxy'

If you have feedback or need assistance with the MCP directory API, please join our Discord server