How do I use fastcontext-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@fastcontext-mcp find all places that call the payment API" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

fastcontext-mcp

by jmacd867

Overview Schema Related Servers Score Discussions

Python

Local

fastcontext-mcp

An MCP server that wraps FastContext-1.0 as a repo-exploration subagent for Claude Code.

Instead of letting Sonnet spend half its context budget grepping around a codebase, you offload that work to a dedicated 4B model trained specifically to explore repos. FastContext issues parallel READ/GLOB/GREP calls, then returns compact file paths + line ranges as grounded citations. Claude Code gets clean context; FastContext does the legwork.

Claude Code (Sonnet) ──explore_repo──▶  fastcontext-mcp  ──READ/GLOB/GREP──▶  your repo
        ▲                                       │
        └──── file:line citations ──────────────┘

Based on Microsoft's FastContext paper: integrating FastContext improves coding agent accuracy by up to 5.5% while reducing main-agent token consumption by up to 60%.

Requirements

Python 3.10+
A running FastContext inference server (SGLang or vLLM, OpenAI-compatible)
Claude Code

Related MCP server: Code Intelligence MCP Server

Setup

1. Serve FastContext locally

You need a GPU with ~6GB VRAM for the 4B model. The 4B-RL variant slightly outperforms 4B-SFT on most benchmarks and is recommended for deployment.

pip install sglang[all]

# SFT variant (default)
./scripts/serve.sh microsoft/FastContext-1.0-4B-SFT

# RL variant (recommended)
./scripts/serve.sh microsoft/FastContext-1.0-4B-RL

Or with vLLM:

pip install vllm
vllm serve microsoft/FastContext-1.0-4B-SFT --tool-call-parser hermes

The server will be available at http://localhost:30000.

2. Install this MCP server

git clone https://github.com/YOUR_USERNAME/fastcontext-mcp
cd fastcontext-mcp
pip install -e .

3. Register with Claude Code

Add to your ~/.claude/claude_desktop_config.json (or project-level .mcp.json):

{
  "mcpServers": {
    "fastcontext": {
      "command": "fastcontext-mcp",
      "env": {
        "FASTCONTEXT_BASE_URL": "http://localhost:30000/v1",
        "FASTCONTEXT_MODEL": "FastContext-1.0-4B-SFT"
      }
    }
  }
}

Restart Claude Code. You should see explore_repo in the available tools.

Usage

Once registered, Claude Code can call explore_repo automatically, or you can invoke it explicitly:

explore_repo("where is the rate limiting middleware defined")
explore_repo("find all places that call the payment API", repo_root="/path/to/repo")

FastContext will issue several parallel read/search calls internally and return something like:

<final_answer>
- src/middleware/ratelimit.py: lines 12-47
- src/middleware/__init__.py: line 8
- tests/test_ratelimit.py: lines 1-30
</final_answer>

Claude Code then uses those citations as focused context rather than reading the whole codebase.

Configuration

Env var	Default	Description
`FASTCONTEXT_BASE_URL`	`http://localhost:30000/v1`	SGLang/vLLM server URL
`FASTCONTEXT_MODEL`	`FastContext-1.0-4B-SFT`	Model name as registered in the server
`FASTCONTEXT_MAX_TURNS`	`8`	Max exploration turns before giving up
`FASTCONTEXT_MAX_FILE_LINES`	`300`	Max lines returned per READ call

No GPU? Remote inference

If you don't have a local GPU, you can serve FastContext on a remote machine and point FASTCONTEXT_BASE_URL at it. The MCP server itself is CPU-only and just proxies requests.

Why not just use Claude Code directly?

You can. But FastContext is trained specifically for the locate-relevant-code task, and it's 4B parameters — it's faster and cheaper per exploration call than routing everything through Sonnet. On large codebases the token savings are significant (the paper reports up to 60% reduction in main-agent tokens).

License

MIT. FastContext model weights are also MIT licensed by Microsoft.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jmacd867/fastcontext-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server