mcp-llm-behave
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@mcp-llm-behaveRun a behavior test: prompt='Explain AI', expected='define AI', output='AI is...'"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
mcp-llm-behave
MCP server exposing llm-behave behavioral regression testing as callable tools inside Claude Desktop, Claude Code, and any MCP-compatible client.
Runs offline — no API calls, no external services. Uses sentence-transformers for embedding-based similarity.
Tools
Tool | What it does |
| Assert that a model output matches an expected behavior description |
| Detect semantic drift between a baseline and a new LLM output |
| Browse the built-in behavioral checks shipped with llm-behave |
Related MCP server: EvalView
Quickstart — Claude Desktop
Add to your claude_desktop_config.json (no install needed, uvx handles it):
{
"mcpServers": {
"mcp-llm-behave": {
"command": "uvx",
"args": ["mcp-llm-behave"]
}
}
}Config file location:
macOS:
~/Library/Application Support/Claude/claude_desktop_config.jsonWindows:
%APPDATA%\Claude\claude_desktop_config.json
Restart Claude Desktop after editing. The first run downloads the sentence-transformers model (~80 MB) once and caches it.
Quickstart — Claude Code (CLI)
claude mcp add mcp-llm-behave uvx mcp-llm-behaveInstall via pip / uv
pip install mcp-llm-behave
# or
uv add mcp-llm-behaveRun the server directly:
mcp-llm-behaveTool reference
run_behavior_test
Check whether a model output semantically satisfies an expected behavior.
Arguments
Name | Type | Description |
| str | The original prompt sent to the LLM (used for context/logging) |
| str | Plain-language description of what the output should do |
| str | The actual text returned by the LLM |
Returns
{
"score": 0.82,
"passed": true,
"threshold": 0.45
}compare_outputs
Detect semantic drift between a known-good baseline and a new output. Useful in CI after prompt or model changes.
Arguments
Name | Type | Description |
| str | The reference / previous LLM output |
| str | The new LLM output to compare |
Returns
{
"similarity_score": 0.91,
"drift_detected": false,
"interpretation": "Outputs are nearly identical — no drift."
}list_builtin_behaviors
Returns the catalog of pre-defined behavioral checks available in llm-behave, with method signatures and descriptions.
Returns — list of objects with name, method, and description keys.
Requirements
Python 3.10+
No API keys needed
~80 MB disk for the sentence-transformers model (downloaded once on first run)
Development
git clone https://github.com/Swanand33/mcp_llm_behave
cd mcp-llm-behave
uv sync
uv run pytestLicense
MIT — see LICENSE.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Swanand33/mcp_llm_behave'
If you have feedback or need assistance with the MCP directory API, please join our Discord server