ctrltest-mcp - Flight-control regression lab for MCP agents

TL;DR: Evaluate PID and bio-inspired controllers against analytic or diffSPH/Foam-Agent data through MCP, logging overshoot, energy, and gust metrics automatically.

What it provides

Scenario	Value
Analytic PID benchmarking	Run closed-form plant models and produce overshoot/settling/energy metrics without manual scripting.
High-fidelity scoring	Ingest logged data from Foam-Agent or diffSPH runs and fuse it into controller evaluations.
MCP integration	Expose the scoring API via STDIO/HTTP so ToolHive or other clients can automate gain tuning and generate continuous performance scorecards.

Quickstart

uv pip install "git+https://github.com/Three-Little-Birds/ctrltest-mcp.git"

Run a PID evaluation:

from ctrltest_mcp import ( ControlAnalysisInput, ControlPlant, ControlSimulation, PIDGains, evaluate_control, ) request = ControlAnalysisInput( plant=ControlPlant(natural_frequency_hz=3.2, damping_ratio=0.35), gains=PIDGains(kp=2.0, ki=0.5, kd=0.12), simulation=ControlSimulation(duration_s=3.0, sample_points=400), setpoint=0.2, ) response = evaluate_control(request) print(response.model_dump())

Typical outputs (analytic only):

{ "overshoot": -0.034024863556091134, "ise": 0.008612387509182674, "settling_time": 3.0, "gust_detection_latency_ms": 0.8, "gust_detection_bandwidth_hz": 1200.0, "gust_rejection_pct": 0.396, "cpg_energy_baseline_j": 12.0, "cpg_energy_consumed_j": 7.8, "cpg_energy_reduction_pct": 0.35, "lyapunov_margin": 0.12, "moe_switch_penalty": 0.135, "moe_latency_ms": 12.72, "moe_energy_j": 3.9, "multi_modal_score": null, "extra_metrics": null, "metadata": {"solver": "analytic"} }

The analytic plant example above clips settling_time at the requested simulation duration (duration_s=3.0). Increase the horizon if you need the loop to settle fully before computing that metric.

Run as a service

CLI (STDIO transport)

uvx ctrltest-mcp # runs the MCP over stdio # or just python -m ctrltest_mcp

Use python -m ctrltest_mcp --describe to print basic metadata without starting the server.

FastAPI (REST)

uv run uvicorn ctrltest_mcp.fastapi_app:create_app --factory --port 8005

python-sdk tool (STDIO / MCP)

from mcp.server.fastmcp import FastMCP from ctrltest_mcp.tool import build_tool mcp = FastMCP("ctrltest-mcp", "Flapping-wing control regression") build_tool(mcp) if __name__ == "__main__": mcp.run()

ToolHive smoke test

Run the integration script from your workspace root:

uvx --with 'mcp==1.20.0' python scripts/integration/run_ctrltest.py

The smoke test runs the analytic path by default. To exercise high-fidelity scoring, stage Foam-Agent archives under logs/foam_agent/ and diffSPH gradients under logs/diffsph/ before launching the script.

Agent playbook

Gust rejection - feed archived diffSPH gradients (diffsph_metrics) and Foam-Agent archives (paths returned by those services) to quantify adaptive CPG improvements.
Controller comparison - log analytics for multiple PID gains, export JSONL evidence, and visualise in Grafana.
Policy evaluation - integrate with RL or evolutionary algorithms; metrics are structured for automated scoring.

Stretch ideas

Extend the adapter for PteraControls (planned once upstream Python bindings are published).
Drive the MCP from scripts/fitness to populate nightly scorecards.
Combine with migration-mcp to explore route-specific disturbance budgets.

Accessibility & upkeep

Hero badges include alt text and stay under five to maintain scanability.
Run uv run pytest (tests mock diffSPH/Foam-Agent inputs and assert deterministic analytic results).
Keep metric schema changes documented—downstream dashboards rely on them.

Metric schema at a glance

Field	Units	Notes
`overshoot`	radians	peak response minus setpoint
`ise`	rad²·s	integral squared error
`settling_time`	seconds	first time error stays within tolerance
`gust_detection_latency_ms`	milliseconds	detector latency
`gust_detection_bandwidth_hz`	hertz	detector bandwidth
`gust_rejection_pct`	0–1	fraction of disturbance rejected
`cpg_energy_baseline_j`	joules	energy pre-adaptation
`cpg_energy_consumed_j`	joules	energy post-adaptation
`cpg_energy_reduction_pct`	0–1	energy reduction ratio
`lyapunov_margin`	unitless	stability margin
`moe_switch_penalty`	unitless	cost weight × switches
`moe_latency_ms`	milliseconds	latency budget after switching
`moe_energy_j`	joules	mix-of-experts energy draw
`multi_modal_score`	unitless	only when both diffSPH & Foam metrics are present
`extra_metrics`	varies	raw diffSPH/Foam metrics merged in