Skip to main content
Glama

MCProbe

A stdio MCP server that audits other MCP servers over the live protocol. It connects to any MCP target (stdio or HTTP), lints every tool's schema for agent-usability, then actually calls the tools with deliberately broken inputs to see how the server handles them, and returns a 0–100 conformance score with a per-dimension breakdown rendered as Markdown.

The behavioral pass is the part that matters. Static schema audits tell you that a tool exists and looks reasonable. MCProbe then picks up a phone and dials each tool with missing_required, wrong_type, out_of_enum, and extra_garbage inputs — the same mistakes a language model will make on a bad day — and classifies the response. A server that says "OK" to garbage is graded harshly. A server that crashes the JSON-RPC transport is graded harsher. A server that returns a clean isError: true is graded correctly.

Problem statement

The Model Context Protocol is new. Servers proliferate. Most ship with tool schemas that an agent can call, but few ship with tool schemas that an agent can call correctly: parameters are untyped, descriptions are missing, names are not snake_case, and a quick look at the code reveals that the handler is doing Number(x) / Number(y) with no guard at all.

The convention in the wider ecosystem is to ship a static schema audit that flags the obvious smells and then declare the server ready. The smells are real, but a static audit cannot tell you whether the server behaves: it cannot tell you that divide("x", "y") silently returns NaN, or that an extra unknown key is just stripped and ignored.

MCProbe does both, on a single connection:

  1. Static lint. Eleven rules over every tool's schema: missing or thin descriptions, duplicate or unusual names, an empty or non-object schema, untyped or undocumented parameters, and a server-wide rule for "I said I had tools but I have none."

  2. Behavioral fuzz. For each tool, the generator produces one valid case and at least three malformed variants, calls the target over the live JSON-RPC transport, and classifies the outcome as ok (the tool shrugged), toolError (graceful rejection), or protocolCrash (worst case). A malformed case that comes back without isError: true is flagged as silentlyAccepted — exactly the failure mode the linter cannot see.

  3. Scoring. The findings and the fuzz results are combined into a 0–100 score on four dimensions, mapped to an A–F grade, and rendered as a Markdown report the host (or a human) can read.

Related MCP server: hivelaw

Install

npm install
npm run build     # tsc -p tsconfig.json && tsc -p examples/demo-target/tsconfig.json

The build emits:

  • dist/index.js — the probe (run this as a stdio MCP server).

  • examples/demo-target/dist/index.js — a deliberately flawed MCP server used by the tests and the demo.

To launch the probe as a stdio MCP server so any host can talk to it:

npm start

No port, no daemon, no config file. The probe speaks JSON-RPC on stdin/stdout and writes operator logs to stderr.

The six probe_* tools

MCProbe registers four core tools and two optional helpers. The core four cover the full lint → fuzz → score pipeline; the two helpers cover the everyday ergonomics of managing connections.

Tool

Purpose

Returns

probe_connect

Open a connection to a target.

{ connectionId, name, version, capabilities, counts, defaultConnectionId }

probe_lint

Run the 11 lint rules over the target's cached tool summaries.

{ connectionId, server, findings, summary }

probe_fuzz

Generate valid + malformed inputs per tool, call each, classify the outcome.

{ connectionId, server, results, summary }

probe_report

Run lint (and fuzz when requested), score, render Markdown.

{ connectionId, server, overall, grade, dimensions, findings, fuzz, markdown }

probe_list

(optional) Enumerate the target's tools.

{ connectionId, server, tools }

probe_disconnect

(optional) Close one connection (by id) or every connection.

{ removed, remaining, defaultConnectionId }

All tools default to the most recently opened connection when connectionId is omitted, so a single-target audit is a three-call sequence: probe_connectprobe_reportprobe_disconnect.

probe_connect

Two transports: stdio (spawns a child process) and http (speaks the streamable HTTP transport, with SSE fallback). For stdio, command is required; for http, url is required. The target's initialize handshake is run synchronously, the server's identity and capabilities are cached, and a stable connectionId is returned.

probe_lint

A pure pass over the connection's cached tool summaries — no extra round-trip. Each finding carries a stable code, a severity (error, warning, info), a human-readable message, a location ({ tool, param? }), and a hint with a concrete fix.

The eleven rules are:

Code

Severity

What it catches

tool.missing_description

error

A tool with no description at all.

tool.thin_description

warning

A description under 12 characters.

tool.duplicate_name

error

Two tools registered with the same name.

tool.unusual_name

warning

A name that is not snake_case or kebab-case.

tool.no_input_schema

warning

An empty or missing inputSchema.

schema.invalid

error

A schema that fails to compile (Ajv).

schema.root_not_object

warning

A root type that is not object.

schema.no_required

info

Properties declared but no required array.

param.untyped

warning

A property with no type/enum/const/oneOf.

param.missing_description

warning

A property with no description.

server.no_tools

warning

The server claims tools but registers none.

probe_fuzz

For every tool (capped at maxTools, default 10), the generator emits one valid case and at least three malformed variants:

  • missing_required:<field> — drop each required field in turn.

  • wrong_type:<field> — replace each typed field with a value of a different primitive type.

  • out_of_enum:<field> — for enum or const fields, send a value the schema forbids.

  • extra_garbage — append a sentinel key to the valid args.

Each case is sent to the target over the live JSON-RPC transport. The classifier assigns one of three outcomes:

Outcome

Meaning

ok

The target returned a result with isError: false. For a malformed case this is silentlyAccepted: true.

toolError

The target returned a result with isError: true (graceful rejection).

protocolCrash

The call rejected or the transport closed.

probe_report

The convenience entry point. Calls probe_lint (always) and probe_fuzz (when fuzz: true), scores the result on the four dimensions described below, and returns the structured ConformanceReport and a rendered Markdown string. The Markdown is the canonical payload; downstream tools that need the numbers can pull them out of the structured fields.

Scoring model — four dimensions

The scorecard is subtractive. Every dimension starts at 10/10 and loses points only for concrete, observed problems. The overall 0–100 score is the mean of the measured dimensions; dimensions that were not measured (e.g. the two behavioral ones when fuzz: false) are reported as "not measured" and excluded from the average rather than penalized with a fake value. This is what lets a static audit of a clean server still score 100/100.

Letter grades: A ≥ 90, B ≥ 75, C ≥ 60, D ≥ 40, F < 40.

Dimension

Always measured?

What it captures

Metadata & Documentation

yes

Server identity (name, version), advertised capabilities, presence of instructions (+1 bonus).

Schema Quality

yes

Deducted 1 per error, 0.5 per warning, 0.25 per info finding.

Error Handling

only with fuzz: true

Deducted 2 per silentlyAccepted malformed case, 4 per protocolCrash, 1 per toolError on a valid case.

Liveness & Performance

only with fuzz: true

Deducted 4 per protocolCrash on a valid call, 1 per toolError on a valid call, 0.5 per 100ms over a 200ms p50 target.

The full deduction list and the top-offender breakdown for each dimension are emitted in the Markdown report so the score is auditable by a human.

30-second demo

The probe ships with a deliberately flawed demo target at examples/demo-target/ and a smoke script that runs the full probe_report pipeline against it. From a clean clone:

npm install
npm run build
node scripts/smoke-report.mjs

The script spawns the probe as a stdio MCP server, opens a connection to the demo target, calls probe_report with fuzz: true, and prints the Markdown report to stdout. The demo target is wired to fail loudly: greet has no description, divide returns NaN on bad input, set_mode has a thin description, and well_behaved is the only clean tool. The report will show a low overall score with concrete findings and a fuzz table that classifies the broken cases.

For an interactive tour, the official MCP inspector works as a host against the built probe:

npx @modelcontextprotocol/inspector node dist/index.js

The inspector UI lists the six probe_* tools; calling them manually is a good way to see the request/response shape.

External server example

The probe is not coupled to the demo target. To audit any other MCP server, swap the command/args in probe_connect:

// tool call: probe_connect
{
  "transport": "stdio",
  "command": "npx",
  "args": ["-y", "@modelcontextprotocol/server-filesystem@latest", "/tmp"]
}

The probe runs the initialize handshake against the spawned process, caches its tools, and is ready for probe_lint / probe_fuzz / probe_report. The same pattern works for HTTP targets: pass transport: "http" and a url instead.

A real transcript of this audit (run against @modelcontextprotocol/server-filesystem@latest and saved to examples/transcripts/external-server.md) is included in the repository. The script that produced it is scripts/external-audit.mjs. A self-audit (a second copy of the probe scoring the first) lives at examples/transcripts/self-audit.md.

Architecture

MCProbe plays two roles at once: it is a stdio MCP server to its host, and an MCP client to whatever it is auditing. The split mirrors the source layout.

+--------------------+        stdio / http        +-------------------+
|       host         |  <----------------------->  |   target MCP      |
| (claude code, etc) |                              |     server        |
+--------------------+                              +-------------------+
            ^                                                ^
            | JSON-RPC on stdin/stdout                      | JSON-RPC
            |                                                | over the
            v                                                v chosen
+--------------------+        spawn / dial        +-------------------+
|     src/index.ts   |  ------------------------>  | src/target-client |
|  (the probe)       |                              | (outbound client) |
+--------------------+                              +-------------------+
            |
            | calls the pure modules
            v
+--------------------+   +---------------+   +----------------+   +--------------+
|  src/schema-lint   |   |  src/fuzz.ts  |   | src/conformance |   | src/report   |
|  (11 rules)        |   |  (generator)  |   |  (4-dim score) |   |  (markdown)  |
+--------------------+   +---------------+   +----------------+   +--------------+

Module

Role

I/O?

src/types.ts

Shared Finding, FuzzResult, DimensionScore, ConformanceReport types.

none

src/target-client.ts

Outbound MCP client, ConnectionRegistry, callTool wrapper that catches transport errors.

yes — spawns / dials

src/schema-lint.ts

The 11 lint rules. Pure: no I/O, deterministic ordering.

none

src/fuzz.ts

Case generator + runner + summarizeFuzz histogram. Generator is pure; runner threads through a caller-supplied call fn so it stays unit-testable.

none on the generator; the runner calls the target

src/conformance.ts

Per-dimension scoring + rollup. Pure.

none

src/report.ts

Pure Markdown renderer. Same input → same output every run.

none

src/index.ts

McpServer, registers the six probe_* tools, routes them to the pure modules.

yes — owns the stdio transport

The four pure modules (schema-lint, fuzz generator, conformance, report) are deliberately side-effect-free so the vitest suite can exercise them in milliseconds without spawning a target. The integration test in tests/demo-target.test.ts is the only piece that touches a live process; it is the smallest test that proves the build artifact loads over the real protocol.

Limitations

  • The four runtime dependencies are frozen. @modelcontextprotocol/sdk, ajv, ajv-formats, zod. The probe deliberately does not depend on any CLI framework, HTTP server, or transport library beyond what the SDK already exposes. Adding a runtime dependency is an explicit change to the spec.

  • The probe is a stdio MCP server, full stop. It does not expose an HTTP endpoint. Run it as a subprocess of your host.

  • The fuzzer is shallow, not adversarial. It exercises the surface documented by the tool's inputSchema; it does not attempt to discover server-side bugs that are out of band of the tool contract. The point of MCProbe is conformance, not general-purpose server fuzzing.

  • The scoring is subtractive and dimension-local. A perfect score on one dimension does not rescue a failure on another. The four dimensions are weighted equally when measured.

  • Behavioral scores need a real protocol round-trip. When fuzz: false is passed to probe_report, the Error Handling and Liveness & Performance dimensions are reported as "not measured" and excluded from the rollup. A "lint-only" audit can still score 100/100 on a clean server, but it cannot tell you whether the server would survive a bad input.

  • Tooling is four cores + two helpers, no more. The spec pins the surface area. Adding a probe_* tool is an explicit change to the spec.

  • The optional helpers are still required at startup. The McpServer is constructed with the tools capability only; it does not advertise resources or prompts. The probe itself is an audit tool, not a content server.

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/alitiknazoglu/mcprobe'

If you have feedback or need assistance with the MCP directory API, please join our Discord server