Skip to main content
Glama

plan_proactive_agent_eval_guardrails

Read-only

Map evaluation gaps in proactive assistant workflows to structured guardrails including state machines, user simulation, goal inference, timing, and multi-app orchestration gates.

Instructions

Map proactive-assistant eval gaps to PARE-style state-machine, active-user-simulation, goal-inference, intervention-timing, and multi-app orchestration gates.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
workflowNoProactive assistant workflow name.
appsNoApps involved in the proactive workflow.
statesNoModeled app states.
stateCountNoNumber of modeled states.
actionCountNoNumber of state-dependent actions.
taskCountNoNumber of benchmark tasks or scenarios.
hasStateMachineNoWhether apps are modeled as finite state machines.
hasActiveUserSimulationNoWhether active user simulation exists.
hasGoalInferenceEvalsNoWhether goal inference is graded.
hasInterventionTimingEvalsNoWhether intervention timing is graded.
hasMultiAppEvalsNoWhether multi-app orchestration is graded.
flatToolApiOnlyNoCurrent eval only covers flat tool calls.
proactiveWritesNoProactive agent can write or mutate state.
userVisibleActionsNoInterventions can notify, schedule, send, or affect users.

Implementation Reference

  • Main function that builds the proactive agent eval guardrails plan. Normalizes options, evaluates signals (flat_tool_api_gap, missing_active_user_simulation, missing_goal_inference_eval, missing_intervention_timing_eval, multi_app_write_risk, user_visible_interruption_risk), determines status (blocked/needs_eval/ready), and returns the full report including gates, metrics, and next actions.
    function buildProactiveAgentEvalGuardrailsPlan(rawOptions = {}) {
      const options = normalizeOptions(rawOptions);
      const signals = buildSignals(options);
      const critical = signals.filter((signal) => signal.severity === 'critical').length;
      const high = signals.filter((signal) => signal.severity === 'high').length;
    
      return {
        name: 'thumbgate-proactive-agent-eval-guardrails',
        source: SOURCE,
        workflow: options.workflow,
        status: critical > 0 ? 'blocked' : high > 0 ? 'needs_eval' : 'ready',
        summary: {
          signalCount: signals.length,
          critical,
          high,
          apps: options.apps,
          stateCount: options.stateCount,
          actionCount: options.actionCount,
          taskCount: options.taskCount,
        },
        pareMapping: {
          appModel: options.hasStateMachine ? 'finite_state_machine_present' : 'finite_state_machine_required',
          userSimulation: options.hasActiveUserSimulation ? 'active_user_simulation_present' : 'active_user_simulation_required',
          actionSpace: 'state-dependent action spaces should be explicit per app state',
          evaluationAxes: [
            'context observation',
            'goal inference',
            'intervention timing',
            'multi-app orchestration',
          ],
        },
        signals,
        metrics: buildMetrics(options),
        gates: signals.map((signal) => ({
          id: signal.id,
          action: signal.severity === 'critical' ? 'block' : 'warn',
          message: signal.gate,
        })),
        nextActions: [
          'Model each app as states, allowed actions, and valid transitions before judging proactive behavior.',
          'Add active user simulation cases where the user keeps navigating while the agent observes.',
          'Evaluate goal inference separately from intervention timing so a correct goal at the wrong time is still caught.',
          'Block proactive writes across multiple apps until orchestration success and rollback evidence are measured.',
          'Attach the eval report to any claim that a proactive agent is production-ready.',
        ],
        marketingAngle: {
          headline: 'Proactive agents need stateful eval gates.',
          subhead: 'PARE shows why flat tool-call benchmarks miss real app behavior. ThumbGate turns those stateful eval failures into pre-action gates before a proactive assistant interrupts users or writes across apps.',
          replyDraft: 'This is the missing eval shape for proactive agents. Flat tool calls cannot tell whether the agent acted at the right state or the right time. ThumbGate can use this pattern as the enforcement layer: stateful eval failure -> pre-action gate before the next proactive write.',
        },
      };
    }
  • MCP tool handler dispatching 'plan_proactive_agent_eval_guardrails' to buildProactiveAgentEvalGuardrailsPlan(args) and returning the result as text.
    case 'plan_proactive_agent_eval_guardrails':
      return toTextResult(buildProactiveAgentEvalGuardrailsPlan(args));
  • Normalizes raw CLI/tool options into a structured options object with parsed numbers, booleans, and lists.
    function normalizeOptions(raw = {}) {
      return {
        workflow: String(raw.workflow || raw.name || 'proactive agent workflow').trim() || 'proactive agent workflow',
        apps: splitList(raw.apps || raw.applications),
        states: splitList(raw.states || raw['app-states']),
        stateCount: parseNumber(raw['state-count'] || raw.stateCount || splitList(raw.states).length, splitList(raw.states).length),
        actionCount: parseNumber(raw['action-count'] || raw.actionCount, 0),
        taskCount: parseNumber(raw['task-count'] || raw.taskCount, 0),
        hasStateMachine: parseBoolean(raw['state-machine'] || raw.hasStateMachine, false),
        hasActiveUserSimulation: parseBoolean(raw['active-user-simulation'] || raw.hasActiveUserSimulation, false),
        hasGoalInferenceEvals: parseBoolean(raw['goal-inference-evals'] || raw.hasGoalInferenceEvals, false),
        hasInterventionTimingEvals: parseBoolean(raw['intervention-timing-evals'] || raw.hasInterventionTimingEvals, false),
        hasMultiAppEvals: parseBoolean(raw['multi-app-evals'] || raw.hasMultiAppEvals, false),
        flatToolApiOnly: parseBoolean(raw['flat-tool-api-only'] || raw.flatToolApiOnly, false),
        proactiveWrites: parseBoolean(raw['proactive-writes'] || raw.proactiveWrites, false),
        userVisibleActions: parseBoolean(raw['user-visible-actions'] || raw.userVisibleActions, false),
      };
    }
  • Builds signal/gap analysis from options, identifying 6 possible evaluation gaps with severity levels (critical, high) and pre-action gate recommendations.
    function buildSignals(options) {
      const signals = [];
      if (!options.hasStateMachine || options.flatToolApiOnly) {
        signals.push({
          id: 'flat_tool_api_gap',
          severity: 'high',
          message: 'Flat tool APIs miss stateful navigation and state-dependent action spaces.',
          gate: 'Require finite-state app model before proactive execution.',
        });
      }
      if (!options.hasActiveUserSimulation) {
        signals.push({
          id: 'missing_active_user_simulation',
          severity: 'high',
          message: 'Proactive agents need simulated user progress before timing can be evaluated.',
          gate: 'Run active user simulation before enabling anticipatory actions.',
        });
      }
      if (!options.hasGoalInferenceEvals) {
        signals.push({
          id: 'missing_goal_inference_eval',
          severity: 'medium',
          message: 'The agent may intervene without evidence that it inferred the user goal correctly.',
          gate: 'Grade goal inference before intervention approval.',
        });
      }
      if (!options.hasInterventionTimingEvals) {
        signals.push({
          id: 'missing_intervention_timing_eval',
          severity: 'high',
          message: 'A helpful action at the wrong time becomes interruption or damage.',
          gate: 'Require too-early, on-time, and too-late timing eval cases.',
        });
      }
      if ((options.apps.length > 1 || options.hasMultiAppEvals === false) && options.proactiveWrites) {
        signals.push({
          id: 'multi_app_write_risk',
          severity: 'critical',
          message: 'Multi-app proactive writes can compound state mistakes across tools.',
          gate: 'Block multi-app proactive writes until orchestration evals and rollback evidence exist.',
        });
      }
      if (options.userVisibleActions && !options.hasInterventionTimingEvals) {
        signals.push({
          id: 'user_visible_interruption_risk',
          severity: 'high',
          message: 'User-visible interventions need timing proof before notification, scheduling, or communication actions.',
          gate: 'Require intervention timing proof before user-visible actions.',
        });
      }
      return signals;
    }
  • CLI command registration defining the 'proactive-agent-eval-guardrails' command with its aliases, description, MCP tool binding, and all input flags.
    discoveryCommand({
      name: 'proactive-agent-eval-guardrails',
      aliases: ['pare-guardrails', 'proactive-agent-guardrails'],
      description: 'Map PARE-style proactive-agent eval gaps to stateful pre-action gates',
      mcpTool: 'plan_proactive_agent_eval_guardrails',
      flags: [
        jsonFlag(),
        { name: 'workflow', type: 'string', description: 'Proactive assistant workflow name' },
        { name: 'apps', type: 'string', description: 'Comma-separated apps involved in the workflow' },
        { name: 'states', type: 'string', description: 'Comma-separated app states modeled for the eval' },
        { name: 'state-count', type: 'number', description: 'Number of modeled states' },
        { name: 'action-count', type: 'number', description: 'Number of state-dependent actions' },
        { name: 'task-count', type: 'number', description: 'Number of benchmark tasks or scenarios' },
        { name: 'state-machine', type: 'boolean', description: 'Whether apps are modeled as finite state machines' },
        { name: 'active-user-simulation', type: 'boolean', description: 'Whether active user simulation exists' },
        { name: 'goal-inference-evals', type: 'boolean', description: 'Whether goal inference is graded' },
        { name: 'intervention-timing-evals', type: 'boolean', description: 'Whether intervention timing is graded' },
        { name: 'multi-app-evals', type: 'boolean', description: 'Whether multi-app orchestration is graded' },
        { name: 'flat-tool-api-only', type: 'boolean', description: 'Mark that the current eval only covers flat tool calls' },
        { name: 'proactive-writes', type: 'boolean', description: 'Mark that the proactive agent can write or mutate state' },
        { name: 'user-visible-actions', type: 'boolean', description: 'Mark that interventions can notify, schedule, send, or otherwise affect users' },
      ],
    }),
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, indicating a read operation. The description adds no further behavioral details (e.g., whether it requires pre-existing eval data, whether it creates or modifies any state). The description is consistent but adds minimal value beyond the annotation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single 20-word sentence, concise and front-loaded with the action verb. However, the dense listing of concepts may reduce readability. Still, it's appropriately sized given the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 14 parameters, no output schema, and only a readOnlyHint annotation, the description should elaborate on what the tool returns or produces (e.g., a list of gaps, mapped gates, a structured plan). It does neither, leaving the agent uncertain about outputs. The jargon 'PARE-style' is undefined.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description loosely references concepts like 'state-machine, active-user-simulation' which correspond to boolean parameters, but does not explain parameters like flatToolApiOnly, proactiveWrites, userVisibleActions. Adds only modest extra meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'Map' and resource 'proactive-assistant eval gaps', but the term 'PARE-style' is unexplained jargon, and it's unclear what the tool outputs (e.g., a plan, a report, a set of gates). Sibling tools like 'plan_reward_hacking_guardrails' suggest similar functionality but no differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternative planning tools (e.g., plan_agent_design_governance, plan_intent). The description does not provide context for when this tool is appropriate or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/IgorGanapolsky/ThumbGate'

If you have feedback or need assistance with the MCP directory API, please join our Discord server