sampling-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@sampling-mcpsummarize the article on quantum computing"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Sampling in MCP — Demo
A minimal FastMCP example that demonstrates sampling: the mechanism by which an MCP server asks the client to run an LLM completion on its behalf, instead of calling an LLM itself.
The server exposes a summarize_document tool. When called, the tool doesn't talk to any LLM
directly — it requests a completion from the client, which runs the model (GPT-4o via LiteLLM) and
returns the text.
Sampling
The request direction is inverted from a normal tool call:
The server holds no API keys and no model SDK. It just declares what it wants generated.
The client owns the credentials, the model choice, and the LLM SDK. It decides how the generation actually happens (and can apply its own policy, fallbacks, cost controls, etc.).
This keeps secrets on the client side and lets a single server work with whatever model the client is willing to provide.
Related MCP server: Basic MCP Server
Flow
sequenceDiagram
autonumber
participant Main as client.py (main)
participant Client as FastMCP Client
participant Handler as sampling_handler
participant LLM as GPT-4o (LiteLLM → OpenAI)
participant Server as server.py (subprocess)
Note over Main,Server: stdio transport — Client spawns server.py as a child process
Main->>Client: async with client (start + handshake)
Client->>Server: launch server.py, open stdio pipes
Main->>Client: call_tool("summarize_document", {document_text})
Client->>Server: tools/call request
Server->>Server: summarize_document() runs
Server-->>Client: ctx.sample(messages, system_prompt,<br/>temperature, max_tokens, model_preferences)
Note right of Server: Server requests generation —<br/>it does NOT call the LLM itself
Client->>Handler: invoke sampling_handler(messages, params, ctx)
Handler->>Handler: build chat messages,<br/>read OPENAI_API_KEY from .env
Handler->>LLM: acompletion(model, messages, temperature, max_tokens)
LLM-->>Handler: generated summary text
Handler-->>Server: return text (sampling result)
Server->>Server: format "Summary:\n..."
Server-->>Client: tool result
Client-->>Main: CallToolResult
Main->>Client: exit async with → connection closedComponents
flowchart LR
subgraph ClientProc["Client process (holds the secrets)"]
Main["main()<br/>reads sample.txt,<br/>calls the tool"]
Client["FastMCP Client<br/>stdio transport"]
Handler["sampling_handler<br/>OPENAI_API_KEY + LiteLLM"]
end
subgraph ServerProc["Server subprocess (no keys, no LLM SDK)"]
Tool["summarize_document tool<br/>ctx.sample(...)"]
end
LLM[("OpenAI GPT-4o")]
Main --> Client
Client -- "tools/call (stdio)" --> Tool
Tool -- "ctx.sample request (stdio)" --> Handler
Handler -- "HTTPS" --> LLM
LLM -- "completion" --> Handler
Handler -- "result" --> ToolImportant parts of the code
Where | What to notice |
| |
| |
| |
| |
|
Setup
This project uses uv.
# 1. Install dependencies (creates .venv)
uv sync
# 2. Add your key — copy the template and fill it in
cp .env.example .env
# then set OPENAI_API_KEY="sk-..." in .env
# 3. Run — the client launches the server automatically
uv run client.pyopen-me.ipynb contains a guided, step-by-step walkthrough of the same setup.
Expected output
[..] INFO Starting MCP server 'Document Assistant' with transport 'stdio'
gpt-4o # \
0.7 # } printed by sampling_handler (model / temperature / max_tokens)
300 # /
CallToolResult(content=[TextContent(... text="Summary:\n...")], is_error=False)
Connected?: False # connection closed when the `async with` block exits — expectedProject layout
server.py # FastMCP server — exposes summarize_document, uses ctx.sample()
client.py # FastMCP client — runs the sampling_handler (the actual LLM call)
sample.txt # input document fed to the tool
open-me.ipynb # guided walkthrough notebook
.env.example # template for OPENAI_API_KEY (copy to .env)
pyproject.toml # uv project + dependencies (fastmcp, litellm, python-dotenv)This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/srod0010/sampling-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server