Skip to main content
Glama

Deploy Custom Rule

deploy_rule

Deploy custom evaluation rules that automatically enforce quality standards on every future output evaluation call.

Instructions

Deploy a new custom evaluation rule that will fire on every future evaluate_output call of its eval category.

Sibling tools — list_rules enumerates deployed rules, delete_rule removes them, evaluate_output runs them. log_trace / get_traces / delete_trace handle the trace lifecycle separately; evaluate_with_llm_judge / verify_citations run semantic scoring (not heuristic-rule-driven). deploy_rule is the WRITE path that grows the custom-rule library.

Behavior. Writes a row to /.iris/custom-rules.json (atomic write via temp file + rename) and appends a rule.deploy entry to the audit log (/.iris/audit.log). The rule activates immediately for the running process and persists across restarts. Each call mints a fresh rule_id; not idempotent (deploying twice creates two rules). Tenant-scoped in Cloud tier; OSS rules are owned by LOCAL_TENANT. Rate-limited to 20 req/min on HTTP MCP.

Output shape. Returns JSON: { "rule": { "id": "rule-XXXX", "name", "description", "evalType", "severity", "definition", "enabled": true, "createdAt", "updatedAt", "version": 1, "sourceMomentId?" } }. The returned rule is the canonical persisted form; save the id if you plan to update or delete later.

Use when an agent observes a recurring failure pattern and decides to enforce it as a standing rule. The sourceMomentId field preserves provenance — downstream audit can trace the rule back to the moment that inspired it. Combine with evaluate_output + get_traces: 1) evaluate_output surfaces failures; 2) get_traces filters to the failure set; 3) analyze the pattern; 4) deploy_rule bakes it into the default eval path.

Don't use to VALIDATE a rule before committing — deploy writes immediately. Use the dashboard's preview endpoint (POST /api/v1/rules/custom/preview) for dry-run validation against sample output. Don't use to EDIT an existing rule — this call only creates; edits require a dedicated flow (coming in v0.5). To update a rule today: delete_rule then deploy_rule with the new definition.

Parameters. name is 1-120 chars (Zod-enforced min/max); appears in eval_result rule_results so make it human-readable. description is optional, max 500 chars (used in dashboard tooltips). evalType determines WHEN the rule fires (must match the eval_type your evaluate_output calls use; e.g., a "completeness" rule fires on every evaluate_output where eval_type="completeness" OR eval_type="custom"). severity affects dashboard sort + audit log signal but does NOT affect scoring (scoring uses the rule's weight). definition.type and definition.config must match (e.g., regex_match needs config.pattern; cost_threshold needs config.max_usd; min_length needs config.min). sourceMomentId is optional but recommended (preserves workflow-inversion provenance from Make-This-A-Rule composer). Defaults: severity="medium".

Error modes. Throws 400 on invalid definition (Zod rejects — e.g., regex that fails safe-regex2 ReDoS check, or length > 1000 chars). Throws 400 on empty name. Throws 400 if the eval category mismatches the definition type. Returns 429 when HTTP rate limit exceeded. File-write failures (disk full, read-only fs) propagate as 500; the audit log is best-effort and does not block deploy.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
nameYesHuman-readable rule name (used in eval results)
descriptionNoWhat this rule checks for and why it matters
evalTypeYesEval category this rule belongs to; determines when it fires
severityNoSeverity used for dashboard sort + audit alertsmedium
definitionYesCheck definition (regex, length, keyword, cost, or schema)
sourceMomentIdNoOptional Decision Moment ID the rule was derived from (preserves workflow-inversion provenance)
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses writing to file, audit logging, non-idempotency, rate limiting, error modes (400, 429, 500). Annotations only provide basic hints; description adds rich behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is appropriately sized for complexity, well-structured with clear sections. Front-loaded with primary purpose. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete coverage: includes output shape, error modes, parameter details, side effects, lifecycle, and combination guidance with sibling tools. No output schema, so description compensates fully.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with descriptions. Description adds extra meaning: name length constraints, evalType matching requirement, severity purpose, definition type/config coordination, sourceMomentId provenance. Significantly enriches schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Deploy a new custom evaluation rule that will fire on every future evaluate_output call of its eval category.' It uses specific verb (deploy) and resource (custom evaluation rule), and distinguishes from sibling tools like list_rules, delete_rule, evaluate_output.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('recurring failure pattern') and when not to use ('validate before committing', 'edit existing rule'). Provides alternatives (preview endpoint, delete+deploy for updates).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/iris-eval/mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server