Skip to main content
Glama

Deploy Custom Rule

deploy_rule

Deploy a custom rule to enforce recurring quality checks on agent outputs. The rule activates immediately and persists across restarts, running on every future evaluate_output call of its eval category.

Instructions

Deploy a new custom evaluation rule that will fire on every future evaluate_output call of its eval category.

Sibling tools — list_rules enumerates deployed rules, delete_rule removes them, evaluate_output runs them. log_trace / get_traces / delete_trace handle the trace lifecycle separately; evaluate_with_llm_judge / verify_citations run semantic scoring (not heuristic-rule-driven). deploy_rule is the WRITE path that grows the custom-rule library.

Behavior. Writes a row to ~/.iris/custom-rules.json (atomic write via temp file + rename) and appends a rule.deploy entry to the audit log (~/.iris/audit.log). The rule activates immediately for the running process and persists across restarts. Each call mints a fresh rule_id; not idempotent (deploying twice creates two rules). Tenant-scoped in Cloud tier; OSS rules are owned by LOCAL_TENANT. Rate-limited to 20 req/min on HTTP MCP.

Output shape. Returns JSON: { "rule": { "id": "rule-XXXX", "name", "description", "evalType", "severity", "definition", "enabled": true, "createdAt", "updatedAt", "version": 1, "sourceMomentId?" } }. The returned rule is the canonical persisted form; save the id if you plan to update or delete later.

Use when an agent observes a recurring failure pattern and decides to enforce it as a standing rule. The sourceMomentId field preserves provenance — downstream audit can trace the rule back to the moment that inspired it. Combine with evaluate_output + get_traces: 1) evaluate_output surfaces failures; 2) get_traces filters to the failure set; 3) analyze the pattern; 4) deploy_rule bakes it into the default eval path.

Don't use to VALIDATE a rule before committing — deploy writes immediately. Use the dashboard's preview endpoint (POST /api/v1/rules/custom/preview) for dry-run validation against sample output. Don't use to EDIT an existing rule — this call only creates; edits require a dedicated flow (coming in v0.5). To update a rule today: delete_rule then deploy_rule with the new definition.

Parameters. name is 1-120 chars (Zod-enforced min/max); appears in eval_result rule_results so make it human-readable. description is optional, max 500 chars (used in dashboard tooltips). evalType determines WHEN the rule fires (must match the eval_type your evaluate_output calls use; e.g., a "completeness" rule fires on every evaluate_output where eval_type="completeness" OR eval_type="custom"). severity affects dashboard sort + audit log signal but does NOT affect scoring (scoring uses the rule's weight). definition.type and definition.config must match (e.g., regex_match needs config.pattern; cost_threshold needs config.max_usd; min_length needs config.min). sourceMomentId is optional but recommended (preserves workflow-inversion provenance from Make-This-A-Rule composer). Defaults: severity="medium".

Error modes. Throws 400 on invalid definition (Zod rejects — e.g., regex that fails safe-regex2 ReDoS check, or length > 1000 chars). Throws 400 on empty name. Throws 400 if the eval category mismatches the definition type. Returns 429 when HTTP rate limit exceeded. File-write failures (disk full, read-only fs) propagate as 500; the audit log is best-effort and does not block deploy.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nameYesHuman-readable rule name (used in eval results)
descriptionNoWhat this rule checks for and why it matters
evalTypeYesEval category this rule belongs to; determines when it fires
severityNoSeverity used for dashboard sort + audit alertsmedium
definitionYesCheck definition (regex, length, keyword, cost, or schema)
sourceMomentIdNoOptional Decision Moment ID the rule was derived from (preserves workflow-inversion provenance)
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses many behavioral traits beyond annotations: writes to ~/.iris/custom-rules.json via atomic write, appends to audit log, immediate activation, non-idempotent (two deploys create two rules), rate-limited to 20 req/min, tenant-scoped, error modes (400, 429, 500). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is long but well-structured with clear sections (Sibling tools, Behavior, Output shape, Use when, Don't use, Parameters, Error modes). Front-loaded with purpose. Every sentence adds value, though some details could be condensed. For a complex tool, the structure aids readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Comprehensive for a tool with 6 params, nested objects, and no output schema. Covers usage, behavior, output format, error modes, parameter details, and even mentions future version for editing. The description fully equips an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds extra meaning: name length range, description max, evalType must match evaluate_output's eval_type, severity does not affect scoring, definition type/config must match, sourceMomentId is optional but recommended, defaults severity to medium. This goes beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool deploys a custom rule that fires on future evaluate_output calls. It distinguishes from sibling tools by naming list_rules, delete_rule, evaluate_output, log_trace, etc., and explains that deploy_rule is the WRITE path for custom rules.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (recurring failure pattern) and provides a workflow (evaluate_output -> get_traces -> analyze -> deploy_rule). Also gives clear 'Don't use' cases: validation (use preview endpoint) and editing (use delete_rule + deploy_rule).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/iris-eval/mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server