# MCP Context Hub
Local MCP server (Node.js + TypeScript) for context optimization, RAG memory, semantic cache, and sub-MCP proxy. Designed to run on a machine with GPU (RTX 3060 Ti) + Ollama, acting as a single MCP endpoint for Claude.
## Architecture
```
Claude (Remote)
|
HTTP POST/GET/DELETE + Bearer Token
|
+-----------v-----------+
| Express (:3100) |
| Auth + IP Allowlist |
+-----------+-----------+
|
+-----------v-----------+
| McpServer (SDK v1) |
| |
| Tools: |
| context_pack |
| memory_search |
| memory_upsert |
| context_compress |
| proxy_call |
+-+------+------+-----+-+
| | | |
v v v v
Ollama SQLite Cache ProxyMgr
Client Vector LRU (stdio
(chat Store +TTL sub-MCP)
+embed +FTS5
+fallback)
```
## Features
- **context_pack** — Combines semantic + text search, deduplication, and LLM synthesis into a structured context bundle (summary, facts, next actions)
- **memory_search** — Semantic similarity search over stored documents using vector embeddings
- **memory_upsert** — Store documents with automatic chunking, embedding, and indexing
- **context_compress** — Compress text into bullets, JSON, steps, or summary format to reduce token usage
- **proxy_call** — Call tools on sub-MCP servers (e.g., filesystem) with optional post-processing (summarize, compress)
## Requirements
- Node.js >= 20
- [Ollama](https://ollama.ai) with the following models:
- `llama3.1:8b-instruct-q4_K_M` (primary chat)
- `qwen2.5:7b-instruct-q4_K_M` (fallback chat)
- `nomic-embed-text:v1.5` (embeddings, 768 dims)
## Quick Start
```bash
# 1. Clone and install
git clone https://github.com/DiegoNogueiraDev/mcp-context-hub.git
cd mcp-context-hub
npm install
# 2. Pull Ollama models
ollama pull llama3.1:8b-instruct-q4_K_M
ollama pull qwen2.5:7b-instruct-q4_K_M
ollama pull nomic-embed-text:v1.5
# 3. Configure environment
cp .env.example .env
# Edit .env and set MCP_AUTH_TOKEN to a secure random value
# 4. Start the server
npm run dev
```
Or use the setup script:
```bash
chmod +x scripts/setup.sh
./scripts/setup.sh
npm run dev
```
## Usage
### Health Check
```bash
curl http://localhost:3100/health
# {"status":"healthy","timestamp":"..."}
```
### MCP Protocol
The server uses Streamable HTTP transport at `/mcp`. Initialize a session first:
```bash
# Initialize session
curl -X POST http://localhost:3100/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "Authorization: Bearer <your-token>" \
-d '{
"jsonrpc": "2.0",
"method": "initialize",
"params": {
"protocolVersion": "2025-03-26",
"capabilities": {},
"clientInfo": { "name": "my-client", "version": "1.0.0" }
},
"id": 1
}'
```
Then call tools using the `mcp-session-id` header from the response:
```bash
# Store a document
curl -X POST http://localhost:3100/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "mcp-session-id: <session-id>" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "memory_upsert",
"arguments": {
"document_id": "my-doc",
"content": "Your document text here...",
"scope": "project",
"tags": ["example"]
}
},
"id": 2
}'
# Search memories
curl -X POST http://localhost:3100/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "mcp-session-id: <session-id>" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "memory_search",
"arguments": {
"query": "your search query",
"top_k": 5
}
},
"id": 3
}'
```
### Sub-MCP Proxy
Configure sub-MCP servers via the `PROXY_SERVERS` environment variable:
```bash
PROXY_SERVERS='{"filesystem":{"command":"node","args":["node_modules/@modelcontextprotocol/server-filesystem/dist/index.js","/tmp"]}}' npm run dev
```
Then call tools on them via `proxy_call`:
```json
{
"name": "proxy_call",
"arguments": {
"server": "filesystem",
"tool": "read_file",
"arguments": { "path": "/tmp/example.txt" },
"post_process": "none"
}
}
```
## Configuration
All settings via environment variables (see `.env.example`):
| Variable | Default | Description |
|----------|---------|-------------|
| `MCP_AUTH_TOKEN` | | Bearer token for authentication |
| `MCP_ALLOWED_IPS` | `127.0.0.1,::1` | Comma-separated allowed IPs |
| `OLLAMA_BASE_URL` | `http://localhost:11434` | Ollama API URL |
| `PRIMARY_MODEL` | `llama3.1:8b-instruct-q4_K_M` | Primary chat model |
| `FALLBACK_MODEL` | `qwen2.5:7b-instruct-q4_K_M` | Fallback chat model |
| `EMBEDDING_MODEL` | `nomic-embed-text:v1.5` | Embedding model |
| `PORT` | `3100` | Server port |
| `HOST` | `0.0.0.0` | Server host |
| `DB_PATH` | `./data/context-hub.db` | SQLite database path |
| `CACHE_TTL_MS` | `300000` | Cache TTL (5 minutes) |
| `CACHE_MAX_ENTRIES` | `100` | Max cache entries |
| `LOG_LEVEL` | `info` | Log level (debug, info, warn, error) |
| `PROXY_SERVERS` | `{}` | Sub-MCP server configs (JSON) |
## Commands
```bash
npm run dev # Start dev server (HTTP on :3100)
npm run dev:stdio # Start in stdio mode (for local MCP testing)
npm run build # Compile TypeScript
npm start # Run compiled output
npm test # Run tests (31 tests, 6 files)
npm run typecheck # Type-check without emitting
npm run health # Run health check script
```
## Project Structure
```
src/
config.ts # Environment configuration
index.ts # Entry point + graceful shutdown
db/
connection.ts # SQLite singleton (WAL mode)
migrations.ts # Table definitions (documents, chunks, FTS5, audit)
cosine.ts # Cosine similarity + embedding serialization
server/
mcp-server.ts # McpServer setup + tool registration
transport.ts # Express + Streamable HTTP transport
session.ts # Session management
middleware/
auth.ts # Bearer token validation
ip-allowlist.ts # IP restriction
audit.ts # Tool call logging
tools/
schemas.ts # Zod schemas for all tools
context-pack.ts # context_pack implementation
memory-search.ts # memory_search implementation
memory-upsert.ts # memory_upsert implementation
context-compress.ts # context_compress implementation
proxy-call.ts # proxy_call implementation
services/
ollama-client.ts # Ollama API (chat + embed + fallback)
sqlite-vector-store.ts # Vector store (SQLite + brute-force cosine)
text-search.ts # FTS5 full-text search
chunker.ts # Recursive text splitter
dedup.ts # Content hashing + Jaccard dedup
semantic-cache.ts # LRU + TTL in-memory cache
proxy-manager.ts # Sub-MCP stdio connections
utils/
logger.ts # Pino structured logging
metrics.ts # In-memory call metrics
retry.ts # Exponential backoff retry
tokens.ts # Token estimation
types/
index.ts # Type re-exports
ollama.ts # Ollama API types
vector-store.ts # VectorStore interface
tests/
unit/ # cosine, chunker, dedup, cache
integration/ # sqlite vector store
e2e/ # Express server
```
## Tech Stack
- **Runtime**: Node.js 20, TypeScript
- **MCP SDK**: `@modelcontextprotocol/sdk` v1.26
- **HTTP**: Express v5 + Streamable HTTP transport
- **Database**: SQLite (`better-sqlite3`) with WAL mode, FTS5
- **Embeddings**: Ollama `nomic-embed-text:v1.5` (768 dimensions)
- **Chat**: Ollama with automatic model fallback
- **Validation**: Zod v4
- **Logging**: Pino
- **Testing**: Vitest
## License
MIT