The CtrlTest MCP Server provides automated control system evaluation and regression testing for PID and bio-inspired controllers in flapping-wing aircraft systems, exposing a standardized API (ctrltest.analyze_pid) for comprehensive performance analysis.
Core Capabilities:
PID Controller Analysis - Evaluate PID gains against plant dynamics (natural frequency, damping ratio) to measure overshoot, settling time, integral squared error (ISE), and Lyapunov stability margins
High-Fidelity Simulation Integration - Ingest data from diffSPH (differentiable smoothed-particle hydrodynamics) and Foam-Agent CFD simulations via
extra_metricsfor realistic performance assessmentMulti-Modal Scoring - Fuse analytic models with high-fidelity simulation data when both diffSPH and Foam-Agent metrics are available
Gust Rejection Analysis - Quantify disturbance handling through gust detection latency, bandwidth, and rejection percentage metrics
Energy Efficiency Assessment - Measure CPG (Central Pattern Generator) baseline vs. consumed energy and calculate energy reduction percentages
Mix-of-Experts Evaluation - Compute switching penalties, latency budgets, and energy consumption for adaptive control architectures
Automated Integration - Support continuous performance monitoring, automated gain tuning, and controller optimization through STDIO/HTTP API integration with ToolHive, RL algorithms, and evolutionary systems
Structured Output - Export JSON-formatted results with provenance metadata for dashboards, logging, and downstream analysis
Provides a REST API interface for the control system evaluation service, allowing HTTP-based access to controller benchmarking and scoring functionality.
Enables visualization of controller performance metrics and analytics data exported from PID evaluations and comparative controller assessments.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@CtrlTest MCP Serverevaluate PID gains kp=2.0, ki=0.5, kd=0.12 for a second-order plant with natural frequency 3.2Hz"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
ctrltest-mcp - Flight-control regression lab for MCP agents
TL;DR: Evaluate PID and bio-inspired controllers against analytic or diffSPH/Foam-Agent data through MCP, logging overshoot, energy, and gust metrics automatically.
Table of contents
What it provides
Scenario | Value |
Analytic PID benchmarking | Run closed-form plant models and produce overshoot/settling/energy metrics without manual scripting. |
High-fidelity scoring | Ingest logged data from Foam-Agent or diffSPH runs and fuse it into controller evaluations. |
MCP integration | Expose the scoring API via STDIO/HTTP so ToolHive or other clients can automate gain tuning and generate continuous performance scorecards. |
Quickstart
uv pip install "git+https://github.com/Three-Little-Birds/ctrltest-mcp.git"Run a PID evaluation:
from ctrltest_mcp import (
ControlAnalysisInput,
ControlPlant,
ControlSimulation,
PIDGains,
evaluate_control,
)
request = ControlAnalysisInput(
plant=ControlPlant(natural_frequency_hz=3.2, damping_ratio=0.35),
gains=PIDGains(kp=2.0, ki=0.5, kd=0.12),
simulation=ControlSimulation(duration_s=3.0, sample_points=400),
setpoint=0.2,
)
response = evaluate_control(request)
print(response.model_dump())Typical outputs (analytic only):
{
"overshoot": -0.034024863556091134,
"ise": 0.008612387509182674,
"settling_time": 3.0,
"gust_detection_latency_ms": 0.8,
"gust_detection_bandwidth_hz": 1200.0,
"gust_rejection_pct": 0.396,
"cpg_energy_baseline_j": 12.0,
"cpg_energy_consumed_j": 7.8,
"cpg_energy_reduction_pct": 0.35,
"lyapunov_margin": 0.12,
"moe_switch_penalty": 0.135,
"moe_latency_ms": 12.72,
"moe_energy_j": 3.9,
"multi_modal_score": null,
"extra_metrics": null,
"metadata": {"solver": "analytic"}
}The analytic plant example above clips
settling_timeat the requested simulation duration (duration_s=3.0). Increase the horizon if you need the loop to settle fully before computing that metric.
Run as a service
CLI (STDIO transport)
uvx ctrltest-mcp # runs the MCP over stdio
# or just python -m ctrltest_mcpUse python -m ctrltest_mcp --describe to print basic metadata without starting the server.
FastAPI (REST)
uv run uvicorn ctrltest_mcp.fastapi_app:create_app --factory --port 8005python-sdk tool (STDIO / MCP)
from mcp.server.fastmcp import FastMCP
from ctrltest_mcp.tool import build_tool
mcp = FastMCP("ctrltest-mcp", "Flapping-wing control regression")
build_tool(mcp)
if __name__ == "__main__":
mcp.run()ToolHive smoke test
Run the integration script from your workspace root:
uvx --with 'mcp==1.20.0' python scripts/integration/run_ctrltest.pyThe smoke test runs the analytic path by default. To exercise high-fidelity scoring, stage Foam-Agent archives under logs/foam_agent/ and diffSPH gradients under logs/diffsph/ before launching the script.
Agent playbook
Gust rejection - feed archived diffSPH gradients (
diffsph_metrics) and Foam-Agent archives (paths returned by those services) to quantify adaptive CPG improvements.Controller comparison - log analytics for multiple PID gains, export JSONL evidence, and visualise in Grafana.
Policy evaluation - integrate with RL or evolutionary algorithms; metrics are structured for automated scoring.
Stretch ideas
Extend the adapter for PteraControls (planned once upstream Python bindings are published).
Drive the MCP from
scripts/fitnessto populate nightly scorecards.Combine with
migration-mcpto explore route-specific disturbance budgets.
Accessibility & upkeep
Run
uv run pytest(tests mock diffSPH/Foam-Agent inputs and assert deterministic analytic results).Keep metric schema changes documented—downstream dashboards rely on them.
Metric schema at a glance
Field | Units | Notes |
| radians | peak response minus setpoint |
| rad²·s | integral squared error |
| seconds | first time error stays within tolerance |
| milliseconds | detector latency |
| hertz | detector bandwidth |
| 0–1 | fraction of disturbance rejected |
| joules | energy pre-adaptation |
| joules | energy post-adaptation |
| 0–1 | energy reduction ratio |
| unitless | stability margin |
| unitless | cost weight × switches |
| milliseconds | latency budget after switching |
| joules | mix-of-experts energy draw |
| unitless | only when both diffSPH & Foam metrics are present |
| varies | raw diffSPH/Foam metrics merged in |
Example of fused high-fidelity metrics:
{
"extra_metrics": {
"force_gradient_norm": 0.87,
"lift_drag_ratio": 18.4
},
"multi_modal_score": 0.047,
"metadata": {"solver": "analytic"}
}Contributing
uv pip install --system -e .[dev]uv run ruff check .anduv run pytestShare sample metrics in PRs so reviewers can sanity-check improvements quickly.
MIT license - see LICENSE.