Skip to main content
Glama

Deploy Custom Rule

deploy_rule

Deploy a custom evaluation rule to automatically enforce quality checks on agent outputs, catching recurring failures without manual review.

Instructions

Deploy a new custom evaluation rule that will fire on every future evaluate_output call of its eval category.

Sibling tools — list_rules enumerates deployed rules, delete_rule removes them, evaluate_output runs them. log_trace / get_traces / delete_trace handle the trace lifecycle separately; evaluate_with_llm_judge / verify_citations run semantic scoring (not heuristic-rule-driven). deploy_rule is the WRITE path that grows the custom-rule library.

Behavior. Writes a row to ~/.iris/custom-rules.json (atomic write via temp file + rename) and appends a rule.deploy entry to the audit log (~/.iris/audit.log). The rule activates immediately for the running process and persists across restarts. Each call mints a fresh rule_id; not idempotent (deploying twice creates two rules). Tenant-scoped in Cloud tier; OSS rules are owned by LOCAL_TENANT. Rate-limited to 20 req/min on HTTP MCP.

Output shape. Returns JSON: { "rule": { "id": "rule-XXXX", "name", "description", "evalType", "severity", "definition", "enabled": true, "createdAt", "updatedAt", "version": 1, "sourceMomentId?" } }. The returned rule is the canonical persisted form; save the id if you plan to update or delete later.

Use when an agent observes a recurring failure pattern and decides to enforce it as a standing rule. The sourceMomentId field preserves provenance — downstream audit can trace the rule back to the moment that inspired it. Combine with evaluate_output + get_traces: 1) evaluate_output surfaces failures; 2) get_traces filters to the failure set; 3) analyze the pattern; 4) deploy_rule bakes it into the default eval path.

Don't use to VALIDATE a rule before committing — deploy writes immediately. Use the dashboard's preview endpoint (POST /api/v1/rules/custom/preview) for dry-run validation against sample output. Don't use to EDIT an existing rule — this call only creates; edits require a dedicated flow (coming in v0.5). To update a rule today: delete_rule then deploy_rule with the new definition.

Parameters. name is 1-120 chars (Zod-enforced min/max); appears in eval_result rule_results so make it human-readable. description is optional, max 500 chars (used in dashboard tooltips). evalType determines WHEN the rule fires (must match the eval_type your evaluate_output calls use; e.g., a "completeness" rule fires on every evaluate_output where eval_type="completeness" OR eval_type="custom"). severity affects dashboard sort + audit log signal but does NOT affect scoring (scoring uses the rule's weight). definition.type and definition.config must match (e.g., regex_match needs config.pattern; cost_threshold needs config.max_usd; min_length needs config.min). sourceMomentId is optional but recommended (preserves workflow-inversion provenance from Make-This-A-Rule composer). Defaults: severity="medium".

Error modes. Throws 400 on invalid definition (Zod rejects — e.g., regex that fails safe-regex2 ReDoS check, or length > 1000 chars). Throws 400 on empty name. Throws 400 if the eval category mismatches the definition type. Returns 429 when HTTP rate limit exceeded. File-write failures (disk full, read-only fs) propagate as 500; the audit log is best-effort and does not block deploy.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nameYesHuman-readable rule name (used in eval results)
descriptionNoWhat this rule checks for and why it matters
evalTypeYesEval category this rule belongs to; determines when it fires
severityNoSeverity used for dashboard sort + audit alertsmedium
definitionYesCheck definition (regex, length, keyword, cost, or schema)
sourceMomentIdNoOptional Decision Moment ID the rule was derived from (preserves workflow-inversion provenance)
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=false, destructiveHint=false, idempotentHint=false. The description adds rich behavioral details: file writes, audit log, immediate activation, non-idempotent, rate limits, error modes, and output shape. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy but well-structured and front-loaded with the essential purpose. Each section (behavior, output, usage, parameters, errors) earns its place. A minor reduction could tighten it, but no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 params, nested objects, no output schema), the description covers all aspects: purpose, sibling context, behavior, output shape, parameter details, usage guidelines, and error modes. It is fully comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. However, the description adds significant extra meaning: name length constraint and usage in eval results, description dashboard use, evalType matching condition, severity effects, definition type/config matching, sourceMomentId provenance. Error modes for parameter issues are also detailed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Deploy') and resource ('custom evaluation rule'), and distinguishes it from sibling tools by explaining it is the WRITE path that grows the custom-rule library, unlike list/delete/evaluate siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'Use when an agent observes a recurring failure pattern and decides to enforce it as a standing rule.' It also gives clear 'Don't use' scenarios for validation (preview endpoint) and editing (delete+redeploy). Alternatives are named.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/iris-eval/mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server