Which integrations are available for this server?

Allows the Hermes agent to use FastContext for codebase exploration.

How do I use fastcontext-hybrid-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@fastcontext-hybrid-mcp show me how the user registration is implemented" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

fastcontext-hybrid-mcp

by LyuboslavLyubenov

Overview Schema Related Servers Score Discussions

Python

Local

FastContext Hybrid MCP Server

An MCP (Model Context Protocol) server that gathers context from codebases using FastContext-1.0-4B-RL — a 4B parameter model trained by Microsoft for repository exploration.

The server combines LLM-guided code exploration with fuzzy matching to find relevant code snippets for any question about a codebase.

How It Works

User question
    ↓
1. DECOMPOSE — break into sub-questions (code-focused + doc-focused)
    ↓
2. EXPLORE — FastContext 4B model searches the codebase via Grep/Glob/Read
    ↓
3. EXTRACT — fuzzy matching extracts only relevant lines from found files
    ↓
4. GAP-FILL — ripgrep + Levenshtein distance catches what the model missed
    ↓
Snippets (~5K tokens) → fed to larger LLM for synthesis

Performance Gains

Why use this pipeline instead of just asking the model directly?

APPROACH COMPARISON (tested on business-auditor, 1170 files)
═══════════════════════════════════════════════════════════════════════════

Method                          Concept     Answerable   Context/Question
                                Coverage
───────────────────────────────────────────────────────────────────────────
Raw FastContext (no pipeline)   50%         3/6          N/A (model output)
+ Path resolution fix           67%         4/6          N/A
+ Hybrid pipeline (unlimited)   97%         6/6          308K tokens
+ Hybrid pipeline (optimized)   92%         6/6           5K tokens  ← this
───────────────────────────────────────────────────────────────────────────

What each layer adds:

Layer                   What it does                            Gain
──────────────────────────────────────────────────────────────────────
FastContext 4B          Finds relevant files via tool calls     Baseline
Query decomposition     Breaks Q into doc + code sub-questions  +17%
Fuzzy snippet extract   camelCase split + Levenshtein matching  +15%
Gap-fill (ripgrep)      Catches what model missed               +25%
──────────────────────────────────────────────────────────────────────
Total: 50% → 92% concept coverage (+84% improvement)

Context efficiency:

Without optimization:  308K tokens/question  (loads full files)
With optimization:       5K tokens/question  (extracts relevant lines only)
Reduction:               62x smaller context

What this means for the larger LLM:

Without pipeline: feed 308K tokens of raw files → exceeds most context windows, expensive
With pipeline: feed 5K tokens of targeted snippets → fits easily, cheap, higher quality

The 4B model handles the expensive exploration work (searching, reading, filtering). The larger LLM only sees the distilled evidence — no noise, no irrelevant code.

Key Features

Smart search: 4B model decides WHERE to look (not just keyword matching)
Fuzzy matching: camelCase splitting, separator normalization, Levenshtein distance
Minimal context: extracts only relevant lines, not full files (~5K tokens vs ~300K)
Gap-fill: ripgrep safety net catches what the model misses
Q4 quantization: runs on 6GB+ VRAM, ~67 tok/s generation

Related MCP server: SRC (Structured Repo Context)

Quick Start (macOS — Recommended)

For Metal GPU acceleration on Apple Silicon (M1–M4). No Docker needed.

git clone https://github.com/LyuboslavLyubenov/fastcontext-hybrid-mcp
cd fastcontext-hybrid-mcp
chmod +x setup-mac.sh start.sh

# One-command setup (installs dependencies, builds llama.cpp with Metal, downloads model)
./setup-mac.sh

# Start with your project
./start.sh /path/to/your/project

This uses Metal GPU for ~67 tok/s generation. No Docker required.

Prerequisites: macOS on Apple Silicon, Homebrew. The setup script auto-detects everything and installs what's missing.

Quick Start (Linux)

Linux with Vulkan GPU

git clone https://github.com/LyuboslavLyubenov/fastcontext-hybrid-mcp
cd fastcontext-hybrid-mcp
chmod +x setup.sh start.sh

./setup.sh

# Start with your project
./start.sh /path/to/your/project

Linux CPU-only (or Docker)

WORK_DIR=/path/to/your/project docker compose up fastcontext-cpu

Quick Start (Docker)

Docker handles all dependencies but runs CPU-only on macOS (no GPU passthrough).

macOS / Linux CPU

git clone https://github.com/LyuboslavLyubenov/fastcontext-hybrid-mcp
cd fastcontext-hybrid-mcp

WORK_DIR=/path/to/your/project docker compose up fastcontext-cpu

The MCP server exposes streamable-http on port 8090 for MCP clients to connect.

Linux with Vulkan GPU

WORK_DIR=/path/to/your/project docker compose up fastcontext-vulkan

Using with MCP Clients

Always-on streamable-http server (recommended)

Start a single persistent server that accepts requests for any project:

./start.sh /path/to/default/project

The server listens on http://127.0.0.1:8090/mcp with streamable-http transport. Configure your MCP client:

{
  "mcp": {
    "fastcontext": {
      "type": "remote",
      "url": "http://127.0.0.1:8090/mcp",
      "enabled": true
    }
  }
}

The distill tool accepts a work_dir parameter per request, so the same server can search any project without restarting:

distill(question="...", work_dir="/path/to/project-a")
distill(question="...", work_dir="/path/to/project-b")

If work_dir is omitted, the default from startup is used.

Stdio (one project per process)

mcp_servers:
  fastcontext:
    command: "python3"
    args: ["/path/to/fastcontext-hybrid-mcp/mcp_server.py"]
    env:
      FASTCONTEXT_WORK_DIR: "/path/to/your/project"
      FASTCONTEXT_SERVER: "http://127.0.0.1:8080"
    timeout: 120

Make sure llama-server is running first (via ./start.sh or manually).

Docker

WORK_DIR=/path/to/your/project docker compose up fastcontext-cpu

The MCP server listens on port 8090 with streamable-http transport. Configure your MCP client to connect:

{
  "mcp": {
    "fastcontext": {
      "type": "remote",
      "url": "http://localhost:8090/mcp",
      "enabled": true
    }
  }
}

Tools

`distill`

Main tool — retrieves a grounded answer-package for a question. Deterministic ripgrep retrieval finds anchor definitions; the model only extracts artifacts that are present verbatim in the retrieved regions, and every cited symbol/path is validated against the files.

Args:
  question: str        — The question (conceptual or code-specific)
  work_dir: str        — Path to codebase
  seed: int            — Random seed (default: 42)
  max_anchors: int     — Max anchor files to gather evidence from (default: 4)
  evidence_chars: int  — Char budget for gathered evidence (default: 8000)

Returns:
  JSON with:
    answer: str               — Grounded answer citing file:line (when model cooperates)
    artifacts: list           — Verified symbols/values with file + line ranges
    evidence: list            — File:line + actual code region for each anchor
    confidence: str           — high | low | none
    ungrounded_dropped: list  — Model claims that failed validation
    identifiers_used: list    — Identifiers resolved from the question
    identifier_source: str    — literal | concept

`read_snippet`

Extract relevant lines from a single file using fuzzy matching.

Args:
  filepath: str          — Absolute path to file
  concepts: list[str]    — Concepts to search for
  context_lines: int     — Surrounding lines (default: 2)

`list_files`

List files matching a glob pattern.

`health_check`

Check if the inference server is running.

Environment Variables

Variable	Default	Description
`FASTCONTEXT_WORK_DIR`	`/home/llmbox/fastcontext-eval`	Project directory to search
`FASTCONTEXT_SERVER`	`http://127.0.0.1:8080`	llama-server URL
`FASTCONTEXT_MODEL`	`models/FastContext-1.0-4B-RL-Q4_K_M.gguf`	Model path
`FASTCONTEXT_LLAMA_CPP`	auto-detected	llama-server binary path
`FASTCONTEXT_TRANSPORT`	`stdio`	MCP transport: `stdio`, `streamable-http`, `http`, `sse` (sse deprecated — use streamable-http for network)
`FASTCONTEXT_MCP_HOST`	`0.0.0.0`	MCP server bind host (for SSE/HTTP)
`FASTCONTEXT_MCP_PORT`	`8090`	MCP server port (for SSE/HTTP)

Hardware Requirements

Backend	Min RAM	GPU	Platform	Notes
Metal	8 GB unified	Apple Silicon M1+	macOS native	Best for macOS — requires native install, not Docker
Vulkan	6 GB	AMD/Intel/NVIDIA	Linux	Mesa or proprietary drivers
CPU	8 GB RAM	None	Any	Works in Docker on any platform, ~10x slower

Performance

Metric	Value
Model size (Q4_K_M)	2.4 GB
VRAM usage	~6 GB (model + KV cache)
Prompt eval	~420 tokens/sec
Generation	~67 tokens/sec
Context per question	~5K tokens
Time per question	~20-40 seconds

Hosting

For team/production deployment, see HOSTING.md:

VPS with GPU (Lambda Labs, Vast.ai, RunPod, Hetzner)
Systemd services for auto-start
Reverse proxy (nginx, Caddy) for network access
Docker with persistent model volumes
Multi-project setup
Cost estimates

License

MIT

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/LyuboslavLyubenov/fastcontext-hybrid-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server