Powers the server's RAG capabilities by providing text embeddings, document summarization, and context synthesis using local LLMs.
Utilizes SQLite with FTS5 and vector storage to manage and search document memories through both semantic similarity and full-text search.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@MCP Context Hubsearch my memory for the project requirements discussed last week"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
MCP Context Hub
Local MCP server (Node.js + TypeScript) for context optimization, RAG memory, semantic cache, and sub-MCP proxy. Designed to run on a machine with GPU (RTX 3060 Ti) + Ollama, acting as a single MCP endpoint for Claude.
Architecture
Claude (Remote)
|
HTTP POST/GET/DELETE + Bearer Token
|
+-----------v-----------+
| Express (:3100) |
| Auth + IP Allowlist |
+-----------+-----------+
|
+-----------v-----------+
| McpServer (SDK v1) |
| |
| Tools: |
| context_pack |
| memory_search |
| memory_upsert |
| context_compress |
| proxy_call |
+-+------+------+-----+-+
| | | |
v v v v
Ollama SQLite Cache ProxyMgr
Client Vector LRU (stdio
(chat Store +TTL sub-MCP)
+embed +FTS5
+fallback)Features
context_pack — Combines semantic + text search, deduplication, and LLM synthesis into a structured context bundle (summary, facts, next actions)
memory_search — Semantic similarity search over stored documents using vector embeddings
memory_upsert — Store documents with automatic chunking, embedding, and indexing
context_compress — Compress text into bullets, JSON, steps, or summary format to reduce token usage
proxy_call — Call tools on sub-MCP servers (e.g., filesystem) with optional post-processing (summarize, compress)
Requirements
Node.js >= 20
Ollama with the following models:
llama3.1:8b-instruct-q4_K_M(primary chat)qwen2.5:7b-instruct-q4_K_M(fallback chat)nomic-embed-text:v1.5(embeddings, 768 dims)
Quick Start
# 1. Clone and install
git clone https://github.com/DiegoNogueiraDev/mcp-context-hub.git
cd mcp-context-hub
npm install
# 2. Pull Ollama models
ollama pull llama3.1:8b-instruct-q4_K_M
ollama pull qwen2.5:7b-instruct-q4_K_M
ollama pull nomic-embed-text:v1.5
# 3. Configure environment
cp .env.example .env
# Edit .env and set MCP_AUTH_TOKEN to a secure random value
# 4. Start the server
npm run devOr use the setup script:
chmod +x scripts/setup.sh
./scripts/setup.sh
npm run devUsage
Health Check
curl http://localhost:3100/health
# {"status":"healthy","timestamp":"..."}MCP Protocol
The server uses Streamable HTTP transport at /mcp. Initialize a session first:
# Initialize session
curl -X POST http://localhost:3100/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "Authorization: Bearer <your-token>" \
-d '{
"jsonrpc": "2.0",
"method": "initialize",
"params": {
"protocolVersion": "2025-03-26",
"capabilities": {},
"clientInfo": { "name": "my-client", "version": "1.0.0" }
},
"id": 1
}'Then call tools using the mcp-session-id header from the response:
# Store a document
curl -X POST http://localhost:3100/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "mcp-session-id: <session-id>" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "memory_upsert",
"arguments": {
"document_id": "my-doc",
"content": "Your document text here...",
"scope": "project",
"tags": ["example"]
}
},
"id": 2
}'
# Search memories
curl -X POST http://localhost:3100/mcp \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "mcp-session-id: <session-id>" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "memory_search",
"arguments": {
"query": "your search query",
"top_k": 5
}
},
"id": 3
}'Sub-MCP Proxy
Configure sub-MCP servers via the PROXY_SERVERS environment variable:
PROXY_SERVERS='{"filesystem":{"command":"node","args":["node_modules/@modelcontextprotocol/server-filesystem/dist/index.js","/tmp"]}}' npm run devThen call tools on them via proxy_call:
{
"name": "proxy_call",
"arguments": {
"server": "filesystem",
"tool": "read_file",
"arguments": { "path": "/tmp/example.txt" },
"post_process": "none"
}
}Configuration
All settings via environment variables (see .env.example):
Variable | Default | Description |
| Bearer token for authentication | |
|
| Comma-separated allowed IPs |
|
| Ollama API URL |
|
| Primary chat model |
|
| Fallback chat model |
|
| Embedding model |
|
| Server port |
|
| Server host |
|
| SQLite database path |
|
| Cache TTL (5 minutes) |
|
| Max cache entries |
|
| Log level (debug, info, warn, error) |
|
| Sub-MCP server configs (JSON) |
Commands
npm run dev # Start dev server (HTTP on :3100)
npm run dev:stdio # Start in stdio mode (for local MCP testing)
npm run build # Compile TypeScript
npm start # Run compiled output
npm test # Run tests (31 tests, 6 files)
npm run typecheck # Type-check without emitting
npm run health # Run health check scriptProject Structure
src/
config.ts # Environment configuration
index.ts # Entry point + graceful shutdown
db/
connection.ts # SQLite singleton (WAL mode)
migrations.ts # Table definitions (documents, chunks, FTS5, audit)
cosine.ts # Cosine similarity + embedding serialization
server/
mcp-server.ts # McpServer setup + tool registration
transport.ts # Express + Streamable HTTP transport
session.ts # Session management
middleware/
auth.ts # Bearer token validation
ip-allowlist.ts # IP restriction
audit.ts # Tool call logging
tools/
schemas.ts # Zod schemas for all tools
context-pack.ts # context_pack implementation
memory-search.ts # memory_search implementation
memory-upsert.ts # memory_upsert implementation
context-compress.ts # context_compress implementation
proxy-call.ts # proxy_call implementation
services/
ollama-client.ts # Ollama API (chat + embed + fallback)
sqlite-vector-store.ts # Vector store (SQLite + brute-force cosine)
text-search.ts # FTS5 full-text search
chunker.ts # Recursive text splitter
dedup.ts # Content hashing + Jaccard dedup
semantic-cache.ts # LRU + TTL in-memory cache
proxy-manager.ts # Sub-MCP stdio connections
utils/
logger.ts # Pino structured logging
metrics.ts # In-memory call metrics
retry.ts # Exponential backoff retry
tokens.ts # Token estimation
types/
index.ts # Type re-exports
ollama.ts # Ollama API types
vector-store.ts # VectorStore interface
tests/
unit/ # cosine, chunker, dedup, cache
integration/ # sqlite vector store
e2e/ # Express serverTech Stack
Runtime: Node.js 20, TypeScript
MCP SDK:
@modelcontextprotocol/sdkv1.26HTTP: Express v5 + Streamable HTTP transport
Database: SQLite (
better-sqlite3) with WAL mode, FTS5Embeddings: Ollama
nomic-embed-text:v1.5(768 dimensions)Chat: Ollama with automatic model fallback
Validation: Zod v4
Logging: Pino
Testing: Vitest
License
MIT