Run a behavioral litmus on an MCP server
run_litmusGrade any MCP server A–F by running four security checks: tool-output injection, permission overreach, sensitive-data handling, and adversarial-input resilience. Uses Docker sandboxing when available.
Instructions
Grade an MCP server A–F against the open behavioral litmus (litmus-v5). The harness connects the way an agent would, fingerprints the tool surface, and runs four checks: C-01 tool-output injection, C-02 permission/egress overreach (egress in a hardened default-deny Docker sandbox, plus a declared-permission honesty check), C-03 sensitive-data handling (planted canaries), and C-04 adversarial-input handling (malformed/oversized and jailbreak inputs).
This is ACTIVE: it launches the target server's code to exercise it (egress-
sandboxed when Docker is available) and takes ~20–60s. It is not a lookup — for
a server's already-published grade, use verify_attestation. No wallet or RPC
needed.
server_ref examples: npm/@modelcontextprotocol/server-filesystem ·
https://example.com/mcp · ./build/index.js. For a token-gated https:// target,
pass bearer. If Docker is unavailable, C-02 is skipped and the grade is capped
at B for that run.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| server_ref | Yes | What to grade: a registry ref (npm/@scope/server), an https:// MCP URL, or a local path to an MCP entry file. | |
| bearer | No | Bearer token for a token-gated https:// MCP server. Sent as `Authorization: Bearer <token>` to the target origin only. Ignored for stdio/local targets. | |
| header | No | Extra HTTP headers for a gated https:// target, each "Key: Value" (e.g. "X-Api-Key: …"). Overrides the bearer-derived Authorization for the same key. Ignored for stdio/local targets. | |
| unsafe_host_exec | No | Required to grade a registry ref or local path: it launches the target's own code, and without Docker isolation that runs on THIS host. Set true to accept host execution. Ignored for https:// targets or when LITMUS_STDIO_ISOLATION=docker. | |
| timeout_seconds | No | Aggregate wall-clock ceiling for the whole run, in seconds (default 900). Bounds a hostile server that stretches the run across many tools/probes. |