How do I use ppb-mcp?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@ppb-mcp recommend quantization for RTX 5090 with 32GB and 8 concurrent users" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

de en es ja ko ru zh

ppb-mcp

by paulplee

Overview Schema Related Servers Score Discussions

Python

Hybrid

ppb-mcp

An MCP server that exposes Poor Paul's Benchmark GPU inference data — quantization × throughput × VRAM × concurrent users — as queryable tools to any LLM client.

PyPI License: MIT

Hosted instance: https://mcp.poorpaul.dev/ (streamable-http transport, no auth)

What it does

Connect any MCP-aware client (Claude Desktop, Cline, Continue, etc.) to ask questions like:

"What's the best quantization for a 32 GB GPU running Qwen3.5-9B with 8 concurrent users?"
"Show me every model tested at Q4_K_M on the RTX 5090."
"What GPU should I buy for running 27B models at 4 concurrent users on a $800 budget?"
"Why is my RTX 5090 result at Q4_K_M slower than I expected?"

It exposes thirteen tools backed by 39,000+ real benchmark rows:

Quantitative tools

Tool	What it does
`list_tested_configs`	Lists every tested GPU, model, and quantization (call this first)
`query_ppb_results`	Filters raw benchmark rows by GPU / VRAM / model / quant / users / backend / date
`recommend_quantization`	Three-tier empirical-first recommendation engine (high / medium / low confidence)
`recommend_hardware`	Budget-aware GPU recommendation ranked by speed, efficiency, or value-for-money
`explain_result`	Contextual explanation of a result: VRAM pressure, PCIe context, percentile rank
`get_gpu_headroom`	Sanity-checks a (gpu, model, quant, users) configuration for VRAM headroom
`compare_quants_quantitative`	Side-by-side throughput comparison across quantizations for a model + GPU
`get_combined_scores`	Quantitative + qualitative metrics in one call for a (gpu, model, quant) config
`rank_by_priority`	Rank quantizations by speed, efficiency (tok/W), or a balanced composite score

Qualitative tools

Tool	What it does
`get_qualitative_summary`	All available qualitative scores (context-rot, tool accuracy, quality, MT-Bench)
`query_qualitative_results`	Filter qualitative rows by phase, model, quant, GPU, or minimum score thresholds
`get_context_rot_breakdown`	Long-context recall scores by length, depth, and needle type
`get_tool_accuracy_breakdown`	Tool-call accuracy: selection, parameters, hallucination rate, parse success
`compare_quants_qualitative`	Side-by-side qualitative comparison across quantizations with deterministic insight

Data & caching

Benchmark rows are mirrored into a local SQLite cache (./ppb_cache.db by default; override with PPB_DB_PATH). On startup the server loads from SQLite and only contacts HuggingFace when the dataset's git commit SHA has changed — making subsequent restarts fast and offline-friendly.

Related MCP server: atom-mcp-server

Install

1) Use the hosted instance (zero setup)

Add to your MCP client config (Claude Desktop example, ~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "ppb": {
      "transport": { "type": "http", "url": "https://mcp.poorpaul.dev/mcp" }
    }
  }
}

2) `pip install` and run locally (stdio)

pip install ppb-mcp
MCP_TRANSPORT=stdio ppb-mcp

Claude Desktop config:

{
  "mcpServers": {
    "ppb": {
      "command": "ppb-mcp",
      "env": { "MCP_TRANSPORT": "stdio" }
    }
  }
}

3) Docker

docker run --rm -p 9933:9933 \
  -e MCP_TRANSPORT=streamable-http \
  -v ppb-hf-cache:/data/huggingface \
  ghcr.io/paulplee/ppb-mcp:latest

4) From source

git clone https://github.com/paulplee/ppb-mcp
cd ppb-mcp
pip install -e ".[dev]"
ppb-mcp           # streamable-http on :9933

Connect Your LLM Client

All clients use the same hosted endpoint: https://mcp.poorpaul.dev/mcp

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "ppb": {
      "transport": { "type": "http", "url": "https://mcp.poorpaul.dev/mcp" }
    }
  }
}

Restart Claude Desktop after saving.

Cursor

Edit ~/.cursor/mcp.json (create if it doesn't exist):

{
  "mcpServers": {
    "ppb": {
      "url": "https://mcp.poorpaul.dev/mcp",
      "type": "http"
    }
  }
}

Or via UI: Settings → Tools & Integrations → MCP → Add Server.

Windsurf

Edit ~/.codeium/windsurf/mcp_config.json:

{
  "mcpServers": {
    "ppb": {
      "serverUrl": "https://mcp.poorpaul.dev/mcp",
      "transport": "http"
    }
  }
}

VS Code (GitHub Copilot Agent Mode)

Add to your .vscode/mcp.json (workspace) or User settings.json:

{
  "mcp": {
    "servers": {
      "ppb": {
        "type": "http",
        "url": "https://mcp.poorpaul.dev/mcp"
      }
    }
  }
}

Zed

Add to ~/.config/zed/settings.json under "context_servers":

{
  "context_servers": {
    "ppb": {
      "command": {
        "path": "env",
        "args": ["MCP_TRANSPORT=stdio", "uvx", "ppb-mcp"]
      }
    }
  }
}

Cline (VS Code extension)

Open the Cline panel → MCP Servers tab → Add Server → select SSE/HTTP → paste https://mcp.poorpaul.dev/mcp.

Continue.dev

Add to ~/.continue/config.yaml:

mcpServers:
  - name: ppb
    transport:
      type: http
      url: https://mcp.poorpaul.dev/mcp

OpenCode

Add to ~/.config/opencode/config.json:

{
  "mcp": {
    "ppb": {
      "type": "remote",
      "url": "https://mcp.poorpaul.dev/mcp"
    }
  }
}

Goose (Block)

goose mcp add ppb --transport http --url https://mcp.poorpaul.dev/mcp

Any stdio-compatible client

# Zero-install (requires uv):
env MCP_TRANSPORT=stdio uvx ppb-mcp

# After pip install:
env MCP_TRANSPORT=stdio ppb-mcp

Note on transport key names: MCP clients are not yet fully standardised on JSON key names for the HTTP transport. If your client doesn't connect with "type": "http", try "transport": "http", "type": "sse", or "transport": "streamable-http". The endpoint URL is the same regardless.

Example session

> list_tested_configs
{ "gpus": ["Apple M4 Pro", "NVIDIA GB10", "NVIDIA GeForce RTX 5090"],
  "models": ["Qwen3.5-9B", ...], "quantizations": ["Q4_K_M", ...] }

> recommend_quantization(gpu_vram_gb=32, concurrent_users=8, model="Qwen3.5-9B", priority="balance")
{ "recommended_quantization": "Q5_K_M",
  "estimated_vram_usage_gb": 27.8,
  "estimated_tokens_per_second": 142.0,
  "headroom_gb": 4.2,
  "confidence": "high",
  "reasoning": "Q5_K_M is recommended for your NVIDIA GeForce RTX 5090 (32 GB) ...",
  "alternatives": ["Q4_K_M", "Q8_0"] }

> recommend_hardware(target_model="Qwen3.5-27B", target_quantization="Q4_K_M",
                     concurrent_users=4, budget_usd=1200, priority="value")
{ "top_picks": [
    { "gpu": "NVIDIA GeForce RTX 5090", "msrp_usd": 1999, "throughput_tok_s": 94.3,
      "efficiency_tok_per_dollar": 0.047, "rank_reason": "best measured tok/$ in budget" },
    ...
  ],
  "budget_usd": 1200 }

> explain_result(gpu_name="NVIDIA GeForce RTX 5090", model="Qwen3.5-9B",
                 quantization="Q4_K_M", concurrent_users=8, n_ctx=32768)
{ "throughput_tok_s": 142.0,
  "vram_pressure": "medium",
  "pcie_context": "PCIe Gen 5 x16 — full bandwidth, no bottleneck expected",
  "percentile_rank": 0.91,
  "insight": "Top 9% throughput among all Qwen3.5-9B Q4_K_M configurations measured." }

Configuration

Env var	Default	Notes
`HF_DATASET`	`paulplee/ppb-results`	HuggingFace dataset ID
`REFRESH_INTERVAL_HOURS`	`1`	Background refresh cadence
`MCP_TRANSPORT`	`streamable-http`	`stdio` or `streamable-http`
`HOST`	`0.0.0.0`	HTTP bind host
`PORT`	`9933`	HTTP bind port
`LOG_LEVEL`	`INFO`	Python `logging` level

Self-hosting (Lightsail / any Ubuntu VPS)

git clone https://github.com/paulplee/ppb-mcp /tmp/ppb-mcp
cd /tmp/ppb-mcp
DOMAIN=mcp.example.com EMAIL=you@example.com ./deploy/deploy.sh

This installs Docker, builds the image, registers a systemd unit, configures nginx, and runs certbot.

Updating a self-hosted instance

Deployed via `deploy.sh` (systemd manages the container)

cd /opt/ppb-mcp
sudo git pull --ff-only
sudo systemctl restart ppb-mcp

The systemd unit runs docker compose pull before starting, so the new image is fetched automatically. Check that it came up cleanly with:

sudo systemctl status ppb-mcp
# or follow live logs:
journalctl -u ppb-mcp -f

Running docker compose directly (no systemd unit)

cd ~/ppb-mcp          # or wherever you cloned the repo
git pull --ff-only
docker compose build
docker compose up -d
# verify:
docker compose logs -f ppb-mcp

Development

pip install -e ".[dev]"
ruff check src tests
pytest -v

Integration tests against the live HuggingFace dataset are gated behind PPB_RUN_INTEGRATION=1 to keep CI offline-clean.

How recommendations work

Tier 1 — empirical exact match (high confidence). ≥3 measured runs on a GPU at-or-below your VRAM budget at the requested concurrency.
Tier 2 — empirical-near (medium). Same (model, quant) benchmarked on a different GPU at the same concurrency; throughput borrowed, VRAM scaled to your card.
Tier 3 — formula extrapolation (low). vram_per_user ≈ (params_B × bits_per_weight / 8) × 1.15; viable iff total ≤ 90 % of your VRAM.

License

MIT — see LICENSE.

This server cannot be installed

license - permissive license

quality - not tested

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

–Release cycle

–Releases (12mo)

Commit activity

Resources

GitHub Repository

Need Help?

Related Servers

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/paulplee/ppb-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

ppb-mcp

What it does

Quantitative tools

Qualitative tools

Data & caching

Install

1) Use the hosted instance (zero setup)

2) pip install and run locally (stdio)

3) Docker

4) From source

Connect Your LLM Client

Claude Desktop

Cursor

Windsurf

VS Code (GitHub Copilot Agent Mode)

Zed

Cline (VS Code extension)

Continue.dev

OpenCode

Goose (Block)

Any stdio-compatible client

Example session

Configuration

Self-hosting (Lightsail / any Ubuntu VPS)

Updating a self-hosted instance

Deployed via deploy.sh (systemd manages the container)

Running docker compose directly (no systemd unit)

Development

How recommendations work

License

Maintenance

Resources

Looking for Admin?

Latest Blog Posts

MCP directory API

2) `pip install` and run locally (stdio)

Deployed via `deploy.sh` (systemd manages the container)