How do I use MCP 101 Example?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@MCP 101 Example Demonstrate an agentic call by fetching the current server time." That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

MCP 101

Calling a tool

Make sure that nothing is listening on ports 8000 and 8080. Open 3 generously sized terminals on your screen.
Download a sensible model. Qwen 3.5 4B is sensible.

Compile fresh llama.cpp:

git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp
cmake -B build && cmake --build build --config Release -j 6

Launch the llama in terminal #1:

./llama-server -m ~/Downloads/Qwen3.5-4B-Q8_0.gguf --ctx-size 4096 --temp 1.0 --top-p 0.95 --top-k 20 --min-p 0.00 --verbose --webui-mcp-proxy

Clone this repository:

https://github.com/behavioral-ds/mcp-example && cd mcp-example

Install deps: poetry install && poetry shell
Launch MCP in terminal #2: python mcp_serve.py
Execute the Agentic Call™ in terminal #3: python call.py
Observe the dance between LLM <-> Inference engine <-> MCP <-> Client.

Using MCP prompts

Open llama web UI at http://localhost:8080/, go to settings and add a new MCP server:
Select "MCP prompt" when drafting a new message:
That's your @mcp.prompt() parsed into UI element, click it:
...and supply some meaningful content:
Then click "Use prompt" and rejoice: