arifOS — Constitutional AI Governance
Server Details
Constitutional AI governance: 11 mega-tools, 13 floors, VAULT999 ledger. Human-in-loop by design.
- Status
- Unhealthy
- Last Tested
- Transport
- Streamable HTTP
- URL
- Repository
- ariffazil/arifosmcp
- GitHub Stars
- 39
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Score is being calculated. Check back soon.
Available Tools
36 toolsagi_mindCInspect
Structured reasoning with typed cognitive pipeline.
Modes:
"reason" (default): Standard AGI pipeline (sense → mind → heart → judge)
"sequential": Constitutionally-governed sequential thinking with templates
"step": Add a step to an existing thinking session
"branch": Create a reasoning branch from a step
"merge": Synthesize insights across branches
"review": Review/export a thinking session
Sequential thinking enforces F1-F13 at each step, replacing external Sequential Thinking MCP with native constitutional governance.
Runs the constitutional AGI pipeline producing a narrow decision_packet for the operator and a full audit_packet for the vault.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | reason | |
| debug | No | ||
| query | No | ||
| context | No | ||
| dry_run | No | ||
| platform | No | unknown | |
| template | No | ||
| from_step | No | ||
| risk_tier | No | medium | |
| step_type | No | ||
| branch_ids | No | ||
| session_id | No | ||
| step_content | No | ||
| thinking_session_id | No | ||
| alternative_reasoning | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, yet the description omits critical behavioral details implied by the parameters: it does not explain what 'execution' means (given dry_run and allow_execution flags), what the risk tiers affect, or what distinguishes 'forge' from other modes. References to 'F11' and 'F13' in the auth_context schema are cryptic and unexplained.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is brief (two sentences/fragments), but wastes precious space on the cryptic '333_MIND' branding. Listing the modes duplicates information already present in the enum schema; this space could have been used for behavioral context or sibling differentiation. Acceptably concise but not value-optimized.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex 10-parameter tool with nested objects, execution safety controls (risk_tier, allow_execution), and no output schema, the description is severely underspecified. It fails to explain return values, the safety model, or the operational semantics of the different modes, leaving critical gaps for an agent attempting to invoke this tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing a baseline of 3. The description adds no semantic clarification beyond the schema (e.g., explaining the functional difference between 'reason', 'reflect', and 'forge' modes, or the relationship between dry_run and allow_execution), but the schema carries the descriptive burden adequately.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States it is a 'Core reasoning and synthesis engine' and lists available modes, but the '333_MIND' prefix adds noise rather than clarity. Critically, it fails to distinguish this tool from siblings like 'code_engine', 'math_estimator', or 'search_tool', leaving ambiguity about when to prefer this general reasoning engine over specialized alternatives.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Lists the three modes ('reason', 'reflect', 'forge') but provides no guidance on when to use the tool itself versus its numerous siblings, nor when to select each specific mode. No prerequisites, exclusions, or 'when-not-to-use' guidance is provided despite the presence of execution controls (allow_execution, risk_tier).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
apex_soulCInspect
Final constitutional verdict evaluation.
| Name | Required | Description | Default |
|---|---|---|---|
| debug | No | ||
| query | No | ||
| dry_run | No | ||
| platform | No | unknown | |
| risk_tier | No | medium | |
| telemetry | No | ||
| session_id | No | ||
| candidate_action | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Without annotations, the description must carry full behavioral disclosure but fails significantly. While it notes 'probe' tests 'floors' (corroborated by schema references to F11/F13), it doesn't explain what 'floors' are, what constitutes a 'verdict', side effects of judging/holding, or the implications of 'Final authority'. The execution gating behavior (dry_run vs allow_execution) is undocumented in the description.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is brief and front-loaded with the cryptic '888_JUDGE:' prefix which wastes valuable explanatory space. While the mode list is efficiently presented, the lack of structure (no separation of purpose from capabilities) and reliance on domain jargon ('floors') without explanation reduces clarity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 10-parameter tool with nested objects, complex stateful concepts (floors F11/F13, continuity, sovereignty), and execution-gating capabilities, the single-sentence description is grossly inadequate. It omits the behavioral model, return value characteristics, and the critical relationship between modes, payload fields, and the 'floor' architecture.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is 3. The description adds minor value by mapping 'probe' to 'test floors', but provides no additional semantic context for the complex payload structure, mode-specific requirements (which payload fields apply to which mode), or the risk_tier parameter's impact on behavior.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description identifies the tool as a 'Final authority for verdicts and defense' and lists available modes (judge, rules, validate, etc.), giving a vague sense of governance/security functionality. However, terms like 'verdicts' and 'defense' are abstract without domain context, and the '888_JUDGE:' prefix is noise that doesn't clarify purpose or differentiate from siblings like 'arifOS_kernel' or 'vault_ledger'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description enumerates modes but provides no guidance on when to use specific modes versus others, nor does it explain prerequisites (e.g., when 'allow_execution' should be true vs false). There is no mention of alternatives or exclusion criteria.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
architect_registryCInspect
Initialize constitutional session OR perform kernel syscall.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | init | |
| debug | No | ||
| query | No | ||
| intent | No | ||
| context | No | ||
| dry_run | No | ||
| payload | No | ||
| actor_id | No | ||
| platform | No | unknown | |
| risk_tier | No | medium | |
| call_graph | No | ||
| session_id | No | ||
| current_tool | No | ||
| actual_output | No | ||
| declared_name | No | ||
| requested_tool | No | ||
| allow_execution | No | ||
| observed_effects | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description fails to disclose critical behavioral traits: whether modes are idempotent, what 'dry_run' and 'allow_execution' imply about state mutation, what 'risk_tier' controls, or the security model behind 'auth_context'. The presence of execution flags suggests destructive capabilities that remain undocumented.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely concise at one sentence plus a mode list. Information density is high but results in under-specification for a 10-parameter multi-mode tool. No redundant or filler text, though the brevity compromises completeness.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Inadequate for a complex tool with nested objects, 7 distinct modes, risk management ('risk_tier'), and execution control ('allow_execution'). No output schema is present, yet the description doesn't explain return values, success/failure modes, or side effects. The 'F11' and 'F13' references in auth_context suggest domain-specific concepts that are completely unexplained.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema has 100% description coverage (meeting baseline), though the 'mode' parameter description is tautological ('Mode selector'). The tool description lists the valid mode values but does not explain their semantics or the payload structure variations required for each mode. It neither compensates for schema weaknesses nor adds usage context beyond enumeration.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description identifies the domain ('Tool and resource discovery + Model Registry') and lists the seven operational modes, but fails to explain what the tool actually does in each mode or how these modes relate to the core purpose. It uses noun phrases rather than specific verbs (e.g., 'register' vs 'Registers a tool').
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance provided on when to use this tool versus siblings like 'search_tool', 'vault_ledger', or 'engineering_memory'. No criteria for selecting between the seven modes (e.g., when to use 'model_profile' vs 'model_catalog').
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_diag_substrateCRead-onlyIdempotentInspect
Maintainer: Run substrate protocol conformance check.
| Name | Required | Description | Default |
|---|---|---|---|
| session_id | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide clear hints: readOnlyHint=true, destructiveHint=false, idempotentHint=true, openWorldHint=false. The description does not contradict these annotations, as 'Run... check' aligns with a read-only, non-destructive operation. However, it adds minimal context beyond annotations—it mentions 'conformance check' but does not disclose behavioral traits like what the check entails, potential side effects, or rate limits. With annotations covering safety, the description adds some value but lacks depth.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence: 'Maintainer: Run substrate protocol conformance check.' It is front-loaded with the action and avoids unnecessary words. However, the 'Maintainer:' prefix adds minor noise without clarifying the tool's function, slightly reducing conciseness.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (implied by 'substrate protocol' and diagnostic nature), lack of output schema, and low parameter schema coverage, the description is incomplete. It does not explain what the conformance check involves, what results to expect, or how to interpret the 'session_id' parameter. With annotations providing safety hints but no output details, the description should offer more context for effective use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has one parameter ('session_id') with 0% description coverage, meaning the schema provides no semantic details. The description does not mention any parameters or add meaning beyond the schema. For a tool with low schema coverage, the description fails to compensate, leaving the parameter's purpose and usage undocumented.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Run substrate protocol conformance check' restates the tool name 'arifos_diag_substrate' without adding specificity. It mentions a verb ('Run') and resource ('substrate protocol conformance check'), but fails to distinguish this tool from siblings like 'arifos_probe' or 'arifos_health', which might also perform diagnostic operations. The purpose remains vague about what 'substrate protocol' refers to or what 'conformance check' entails.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It does not mention prerequisites, context, or exclusions, and with many sibling tools (e.g., 'arifos_probe', 'arifos_health'), there is no indication of how this tool differs or when it should be selected. Usage is implied only by the generic action 'Run', offering no explicit when/when-not instructions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_fetchBInspect
Retrieve raw content from a URL via mcp_fetch substrate. Applies F9 Anti-Hantu constitutional filtering to redact spiritual cosplay or hallucinatory consciousness claims in the source content.
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | URL to fetch | |
| max_length | No | Max characters to retrieve | |
| session_id | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=true, openWorldHint=true, idempotentHint=false, and destructiveHint=false, covering safety and idempotency. The description adds value by disclosing the 'F9 Anti-Hantu constitutional filtering' behavior, which modifies content retrieval, and mentions 'redact spiritual cosplay or hallucinatory consciousness claims,' providing context beyond annotations. However, it doesn't detail rate limits, auth needs, or error handling, leaving some behavioral traits uncovered.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized with two sentences that are front-loaded: the first states the core action, and the second adds filtering details. There's minimal waste, but it could be slightly more structured by explicitly separating purpose from behavioral notes. Overall, it's efficient and clear.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (fetching with filtering), annotations cover key behavioral aspects (read-only, open-world, non-idempotent, non-destructive), and an output schema exists, reducing the need for return value explanation. The description adds useful context about filtering, but it could benefit from more details on error cases or performance. For a tool with good annotations and output schema, it's mostly complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 67% (2 out of 3 parameters have descriptions). The description adds no specific parameter semantics beyond what the schema provides; it doesn't explain the 'url' parameter's format, 'max_length' implications, or 'session_id' purpose. With moderate schema coverage, the baseline score of 3 is appropriate as the description doesn't compensate for gaps but doesn't detract either.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('Retrieve raw content') and resource ('from a URL'), specifying the substrate ('via mcp_fetch substrate'). It distinguishes from siblings by mentioning 'F9 Anti-Hantu constitutional filtering,' but doesn't explicitly differentiate from other fetch-like tools if any exist in the sibling list. The purpose is specific but could be more distinct regarding siblings.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit guidance on when to use this tool versus alternatives is provided. The description mentions filtering for 'spiritual cosplay or hallucinatory consciousness claims,' which implies a context for content moderation, but it doesn't specify prerequisites, exclusions, or name alternative tools for different use cases. Usage is implied rather than clearly defined.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_forgeAInspect
Issue signed execution manifest to AF-FORGE substrate. Requires judge SEAL. Preserves separation of powers.
| Name | Required | Description | Default |
|---|---|---|---|
| action | Yes | Execution type | |
| dry_run | No | Generate manifest without dispatch | |
| payload | Yes | Action-specific parameters | |
| session_id | Yes | ||
| constraints | No | Resource limits (cpu, memory, timeout) | |
| judge_g_star | Yes | G★ score at time of verdict | |
| judge_verdict | Yes | Must be SEAL from arifos.judge | |
| af_forge_endpoint | No | Target substrate endpoint |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It successfully discloses the authorization requirement (SEAL) and governance model ('Preserves separation of powers'), indicating this is part of a multi-step approval flow. However, it omits operational traits like destructiveness, async behavior, or execution guarantees.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three dense sentences with zero waste. Front-loaded with the core action ('Issue signed execution manifest'), followed by constraint ('Requires judge SEAL'), and architectural context ('Preserves separation of powers'). Every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For an 8-parameter tool with nested objects and an output schema, the description is minimal but adequate. The existence of an output schema compensates for not describing return values, but the description could benefit from clarifying the distinct execution types (shell, vm, container) available in the action enum.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 88%, establishing a baseline of 3. The description adds semantic value by linking 'judge SEAL' to the judge_verdict and judge_g_star parameters, but does not elaborate on the payload structure or constraints object beyond what the schema already provides.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verb 'Issue' and resource 'signed execution manifest to AF-FORGE substrate'. It distinguishes from sibling arifos.judge by requiring its 'SEAL' output as a prerequisite, though it could clarify what AF-FORGE specifically does compared to other siblings like heart/mind.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states the prerequisite 'Requires judge SEAL', which directly maps to the required parameter 'judge_verdict' and establishes dependency on the sibling arifos.judge tool. However, it lacks explicit 'when not to use' guidance or alternatives for when SEAL isn't available.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_healthARead-onlyIdempotentInspect
Retrieve CPU, Memory, ZRAM, and Disk utilization. F12-hardened read-only access.
| Name | Required | Description | Default |
|---|---|---|---|
| action | No | Telemetry action to perform | get_telemetry |
| dry_run | No | ||
| session_id | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate read-only, non-destructive, and open-world hints, covering safety and scope. The description adds valuable context beyond annotations by specifying 'F12-hardened' (implying enhanced security) and listing the exact resources retrieved (CPU, Memory, ZRAM, Disk), which helps the agent understand the tool's operational behavior and constraints without contradicting annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is highly concise and front-loaded, consisting of two efficient sentences that directly state the tool's function and access method without any wasted words. Every sentence earns its place by providing essential information about retrieval and security.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (telemetry retrieval with security hardening), rich annotations (read-only, open-world), and the presence of an output schema, the description is mostly complete. It specifies what resources are retrieved and the access context, but could benefit from more detail on parameter usage or error handling, though the output schema reduces the need for return value explanation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is low at 33%, with only the 'action' parameter having a description. The description does not add meaning beyond the schema, as it mentions no parameters or their semantics. However, with an output schema present, the burden is reduced, and the baseline of 3 is appropriate since the schema provides some documentation, but the description fails to compensate for the coverage gap.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('Retrieve') and resources ('CPU, Memory, ZRAM, and Disk utilization'), distinguishing it from siblings like 'arifos_fetch' or 'arifos_heart' by specifying telemetry retrieval. It explicitly mentions 'F12-hardened read-only access', which further clarifies its security context and operational scope.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for usage ('F12-hardened read-only access') and implies it's for telemetry retrieval, but does not explicitly state when to use it versus alternatives like 'arifos_heart' or 'arifos_memory'. It lacks explicit exclusions or named alternatives, though the specificity in resources helps infer appropriate scenarios.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_heartBRead-onlyIdempotentInspect
Red-team proposal for ethical risks. Simulate consequences, evaluate against F5, F6, F9.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | critique | |
| query | Yes | Content or proposal to critique | |
| session_id | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full disclosure burden. Mentions 'simulate consequences' and evaluation frameworks, indicating it models outcomes rather than executing actions. However, lacks details on statefulness implications of session_id, persistence, or output format despite having an output schema.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely concise with zero redundancy. Two sentences cover purpose, mechanism, and evaluation criteria. Every token contributes to understanding the tool's scope.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Has output schema (not shown but signaled), so return values needn't be described. However, cryptic references to 'F5, F6, F9' lack context, and session_id's purpose (continuity across calls?) is unexplained despite being stateful infrastructure. Sufficient for basic selection but leaves operational gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is only 33% (content described; mode and session_id undocumented). Description implies 'content' is the proposal to critique and connects 'simulate' to the mode enum, but provides no semantics for session_id or when to use 'critique' vs 'simulate'. Adds marginal value beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Defines core action (red-team/simulate/evaluate) and domain (ethical risks). Mentions specific evaluation criteria (F5, F6, F9). Distinguishes from siblings by focusing on ethical risk analysis, though doesn't explicitly contrast with 'judge' or 'mind' tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit guidance on when to use 'critique' vs 'simulate' modes, or when to invoke 'heart' versus sibling tools like 'judge'. The description lists capabilities but offers no decision criteria for tool selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_initCRead-onlyIdempotentInspect
Initialize constitutional session with identity binding and telemetry seed. Modes: init/probe/state/status (safe read modes) | revoke requires human_approval. probe mode: Session diagnostic checking anchor validity and authority enum compatibility.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | Session operation mode. probe=diagnostic compatibility check. revoke=separate arifos.session tool (requires human_approval). | init |
| intent | Yes | ||
| actor_id | Yes | ||
| platform | No | unknown | |
| risk_tier | No | medium | |
| session_id | No | ||
| declared_name | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Lacking annotations, the description hints at behavioral traits—'identity binding' implies persistent state changes, 'revoke' suggests destructive capability, and 'telemetry seed' implies logging initialization. However, it omits crucial details: persistence guarantees, failure modes, authentication requirements, and side effects of the destructive revoke mode.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Information-dense two-sentence structure front-loads the primary purpose. Efficiently lists modes and elaborates on probe. However, the second sentence packs excessive technical detail that could benefit from slight expansion for clarity given the tool's complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite having an output schema (excusing return value description), the tool has 7 parameters, 6 modes, no annotations, and extremely poor schema coverage. The description inadequately covers the parameter space and operational complexity, particularly regarding mode-specific parameter requirements.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With only 14% schema coverage (only 'mode' described in schema), the description compensates minimally by expanding on probe mode functionality. However, it fails to document critical required parameters 'intent' (20k char string) and 'actor_id' (64 char string), leaving their semantics, formats, and validation requirements completely opaque.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States specific action 'Initialize constitutional session' with key mechanisms 'identity binding' and 'telemetry seed', and enumerates six operational modes. However, uses domain-specific jargon ('constitutional', 'anchor validity') without context, and fails to distinguish from numerous siblings (arifos.forge, arifos.route, etc.).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Lists available modes (init, revoke, refresh, etc.) and briefly describes probe mode functionality, but provides no guidance on when to use this tool versus the 14 sibling tools, nor when to select specific modes over others.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_judgeCInspect
Final constitutional verdict evaluation. Outputs: SEAL, PARTIAL, VOID, HOLD.
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | Action to judge | |
| risk_tier | Yes | medium | |
| telemetry | No | Optional telemetry data | |
| session_id | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full disclosure burden but offers only minimal behavioral context. It enumerates four possible output states but fails to define what SEAL, PARTIAL, VOID, or HOLD signify, whether outputs are mutually exclusive, or if the tool produces side effects like audit logging.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured with the purpose front-loaded and outputs listed in a second sentence. There is no redundancy or fluff. However, breverity crosses into under-specification given the tool's apparent complexity (4 parameters, nested objects, categorical domain).
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite having an output schema (reducing the burden to explain return values), the description remains inadequate. It omits explanations for critical domain concepts ('constitutional'), output semantics, parameter relationships (how risk_tier affects verdicts), and behavioral side effects—significant gaps for a 4-parameter decision tool with nested structures.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 50% (candidate_action and telemetry are described; risk_tier and session_id are not). The description mentions no parameters whatsoever and fails to compensate for uncovered parameters like risk_tier, which clearly modulates verdict severity but lacks semantic explanation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses domain-specific terminology ('constitutional verdict evaluation') and lists specific output categories (SEAL, PARTIAL, VOID, HOLD), which distinguishes it from siblings like forge, heart, or sense. However, 'constitutional' is jargon-heavy and unexplained, leaving the actual scope ambiguous without additional context.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides absolutely no guidance on when to invoke this tool versus alternatives, prerequisites for use, or workflow positioning. Given the 'Final' modifier and categorical outputs, it likely serves as a terminal decision node, but this is not explicit.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_kernelCInspect
Route request to correct metabolic lane or tool family based on risk and task type.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | kernel | |
| query | Yes | Request to route | |
| session_id | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so the description carries full disclosure burden. It states requests are routed but fails to specify whether this triggers execution, returns tool selection metadata, modifies session state, or requires authentication.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three short sentences achieve brevity, but include unhelpful jargon ('metabolic lane', 'rCore') that wastes space without clarifying behavior. The alias declaration is appropriately front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Acknowledges the alias relationship which is critical given both arifos.kernel and arifos_kernel exist as siblings, but leaves significant gaps in explaining the mode enum's purpose and the session_id's role despite having an output schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is low at 33%; only the 'request' parameter has a schema description ('Request to route'). The description adds no semantics for 'mode' (kernel/status enum) or 'session_id', and merely repeats the routing concept for 'request' without clarifying syntax or format.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Identifies the tool as an 'Alias for arifos.kernel' with a 'Route request' verb, which distinguishes it from specific functional siblings like arifos.forge. However, terms like 'metabolic lane' and 'rCore' are opaque jargon that obscure the actual execution scope.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Mentions routing 'based on risk and task type' which implies selection logic, but provides no explicit guidance on when to use this router versus direct invocation of siblings like arifos.forge or arifos.mind, nor when to use the underscore variant versus the dot variant.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifOS_kernelCInspect
Route request to correct metabolic lane.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | kernel | |
| debug | No | ||
| query | No | ||
| dry_run | No | ||
| request | No | ||
| platform | No | unknown | |
| risk_tier | No | medium | |
| session_id | No | ||
| allow_execution | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, yet description fails to disclose critical behavioral traits suggested by parameters: execution risks (risk_tier, allow_execution), reversible vs destructive operations (dry_run), or authentication requirements (auth_context). '000-999 pipe' is unexplained jargon.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely brief (one sentence), but inefficiently structured. Opens with cryptic label '444_ROUTER' and uses unexplained jargon ('metabolic', '000-999 pipe') that obscures rather than clarifies.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Inadequate for tool complexity: 10 parameters including execution controls (risk_tier, allow_execution), nested payload objects, and no output schema. Description's vague metaphors insufficiently explain what the tool actually returns or how it handles the 'critical' risk tier.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, providing clear documentation for all 10 parameters including nested payload objects. Description mentions modes but adds no semantic meaning beyond the schema enums; baseline 3 applies per rubric.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States the tool processes 'complex queries' and lists two modes ('kernel' for reasoning, 'status' for vitals), but relies on opaque metaphors ('metabolic conductor', '000-999 pipe') that don't clarify actual function. Fails to differentiate from siblings like agi_reason or agi_mind.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides no guidance on when to use 'kernel' versus 'status' mode, nor when to choose this tool over sibling reasoning tools (agi_reason, apex_soul, etc.). No prerequisites or alternative suggestions mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_memoryCDestructiveInspect
Retrieve governed memory and engineering context from vector store.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | vector_query | |
| query | Yes | Memory query | |
| session_id | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden but discloses minimal behavioral traits. While 'vector store' hints at semantic search and 'governed' implies access controls, it fails to explain failure modes, result ranking, or what 'governed' specifically entails (permissions, privacy scopes, versioning).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence of nine words is appropriately front-loaded with the verb. However, given the complexity (3 parameters, 4-mode enum, no annotations), it may be excessively terse rather than efficiently concise—critical configuration details are omitted.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite having an output schema (covering return values), the description is incomplete for a retrieval tool with complex configuration. It lacks essential context for the 'mode' parameter variations and does not clarify how this integrates with the broader arifos ecosystem.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is only 33% (only 'query' described). The description does not compensate: it fails to explain the four distinct 'mode' enum values ('vector_query' vs 'query' vs 'engineer'), nor the purpose of 'session_id', creating significant ambiguity for required configuration.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Provides specific verb 'Retrieve' and clear resource 'governed memory and engineering context' from 'vector store'. However, it does not differentiate from sibling tools like 'arifos.memory' or 'arifos.vault', leaving ambiguity about which memory tool to select.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Contains no guidance on when to use this tool versus alternatives (e.g., arifos.mind, arifos.vault) or prerequisites. The description states what it does but not why or when an agent should invoke it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_mindCInspect
Multi-source synthesis and structured first-principles reasoning with uncertainty bands.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | reason=AGI pipeline; sequential=constitutional step-thinking; step/branch/merge/review=thinking session ops | reason |
| query | Yes | Task or question to reason about | |
| context | No | Additional context for reasoning | |
| session_id | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses 'uncertainty bands' as an output characteristic and 'multi-source' as an input pattern, but omits critical behavioral details: whether session_id enables persistent state across calls, side effects on arifos_memory/vault, or the functional differences between the three modes.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single dense sentence with no filler. However, the heavy jargon ('first-principles', 'uncertainty bands') packed into one statement sacrifices clarity for brevity. Information is front-loaded but requires domain inference to parse.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite having an output schema (reducing description burden for returns), the tool has significant complexity: 4 parameters including a 3-value behavioral mode switch and session management. The description inadequately covers these mechanics, leaving the agent to guess how modes differ or how sessions persist.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 50% (2/4 parameters described). The description does not compensate for undocumented parameters: it fails to explain what session_id tracks or how the mode enum values (reason/reflect/forge) alter behavior. It loosely maps 'reasoning' to the query parameter but adds no syntax guidance.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the tool performs 'Multi-source synthesis and structured first-principles reasoning,' which identifies the verb (synthesis/reasoning) and methodology. However, it remains abstract ('first-principles') and fails to distinguish from siblings like 'arifos_forge' despite 'forge' being one of the mode options, creating potential confusion about tool selection.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus siblings (e.g., arifos_sense, arifos_judge) or when to select specific modes (reason vs. reflect vs. forge). No prerequisites, exclusions, or contextual triggers are mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_opsCRead-onlyInspect
Calculate operation costs, thermodynamics, capacity, and timing with entropy analysis.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | cost | |
| query | Yes | Action to estimate costs for | |
| session_id | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, placing full burden on the description. While 'Calculate' suggests a read-only/transformative operation, the description fails to disclose whether results are cached, if the operation is idempotent, computational costs, or what the entropy analysis entails behaviorally.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence with zero waste. Front-loaded with verb 'Calculate' and efficiently lists the four calculation domains covered. Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite having an output schema (reducing description burden for return values), the tool lacks annotations and the description omits behavioral details for a complex domain (thermodynamics/entropy). With 3 parameters and only 33% schema coverage, the description should provide more context to be complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is only 33% (1 of 3 parameters described). The description mentions 'entropy analysis' and 'costs' which map to mode enum values, but doesn't explain the parameter structure, session_id requirements, or that mode selects calculation type. Insufficient compensation for poor schema documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description lists specific calculation targets (costs, thermodynamics, capacity, timing) and mentions entropy analysis, providing clear verbs and resources. However, it fails to differentiate from the numerous arifos_* siblings (forge, heart, judge, etc.), leaving the agent uncertain which tool handles which aspect of the system.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance provided on when to use this tool versus siblings like arifos_forge or arifos_judge. No mention of when to select specific modes (cost, health, vitals, entropy) or prerequisites for the session_id parameter.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_probeARead-onlyIdempotentInspect
Probe system status or component health (system, memory, vault, etc.).
| Name | Required | Description | Default |
|---|---|---|---|
| target | No | Component to probe | system |
| probe_type | No | status | |
| timeout_ms | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The annotations already indicate the tool is read-only, non-destructive, and idempotent, which covers key behavioral traits. The description adds value by specifying the types of probes (status, health, metrics) and components (system, memory, vault), providing context beyond the annotations. No contradiction with annotations is present.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose with no wasted words. It uses parentheses to include examples without cluttering the main statement, making it highly concise and well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given that the tool has annotations covering safety (read-only, non-destructive) and an output schema exists (which handles return values), the description is reasonably complete. It specifies the probe types and components, but could benefit from more detail on usage context or parameter interactions, though not strictly required due to the structured data.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is low at 33%, with only one parameter ('target') having a description. The description mentions 'system status or component health' and examples like 'system, memory, vault', which partially clarifies the 'target' parameter but does not address 'probe_type' or 'timeout_ms'. This adds some meaning but does not fully compensate for the coverage gap, aligning with the baseline.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with a specific verb ('probe') and resource ('system status or component health'), and it provides examples of components (system, memory, vault). However, it does not explicitly differentiate from sibling tools like 'arifos_health' or 'arifos_diag_substrate', which might have overlapping functions, so it falls short of a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, such as the sibling tools 'arifos_health' or 'arifos_diag_substrate'. It lacks explicit when-to-use or when-not-to-use instructions, leaving the agent to infer usage from the purpose alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_replyAInspect
Composite orchestrator for AGI Reply Protocol v3. Internally runs: memory → sense → mind → heart → ops → judge → [vault/forge]. Emits AgiReplyEnvelopeHuman (recipient=human) or AgiReplyEnvelopeAgent (recipient=agent). Every output includes: TO/CC/TITLE/KEY_CONTEXT header, RACI block, computed τ, constitutional floor tags, SEAL signoff. 888 HOLD blocks forge. F1/F13 triggers require human:arif ratification. Schema at arifos://reply/schemas. Session state at arifos://reply/context-pack.
| Name | Required | Description | Default |
|---|---|---|---|
| cc | No | Secondary recipients (agents, vault refs) | |
| to | No | Primary recipient name or agent_id for the reply header | |
| depth | No | ENGINEER | |
| query | Yes | User query or agent task to govern and reply to | |
| dry_run | No | True = plan pipeline without executing stages | |
| platform | No | Output formatter platform. Use agi_reply for protocol envelope. | agi_reply |
| recipient | No | auto → classify via sense stage. human → AgiReplyEnvelopeHuman. agent → AgiReplyEnvelopeAgent. | auto |
| risk_tier | No | medium | |
| session_id | Yes | ||
| compression | No | FULL = session start / cross-agent handoff. DELTA = normal iterative turns (default). SIGNAL_ONLY = sub-agent internal hops. | DELTA |
| prior_state | No | Compressed one-line prior context (omit on first turn) |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide readOnlyHint=true and destructiveHint=false, indicating a safe operation. The description adds valuable behavioral context beyond annotations: it discloses the internal pipeline stages (memory → sense → mind → heart → ops → judge → [vault/forge]), output envelope types, mandatory output components (TO/CC/TITLE/KEY_CONTEXT header, RACI block, etc.), and specific constraints (888 HOLD blocks forge, F1/F13 triggers require human ratification). This significantly enhances understanding of the tool's behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is information-dense but somewhat verbose and technical. Sentences like 'Composite orchestrator for AGI Reply Protocol v3' and 'Every output includes: TO/CC/TITLE/KEY_CONTEXT header...' are front-loaded and efficient. However, phrases like '888 HOLD blocks forge' and references to external schemas could be clearer, reducing overall conciseness.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (11 parameters, internal pipeline, multiple outputs) and the presence of annotations and an output schema, the description provides substantial context. It covers the orchestration process, output formats, and key constraints. While it doesn't explain every parameter detail, the combination with structured data makes it largely complete for informed usage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 73% schema description coverage, the schema documents most parameters well. The description adds semantic context by referencing 'session state at arifos://reply/context-pack,' which clarifies the purpose of session_id and prior_state parameters. It also implies depth levels affect processing, though it doesn't detail parameter interactions. This compensates adequately for the schema's gaps.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose as a 'composite orchestrator for AGI Reply Protocol v3' that runs an internal pipeline and emits specific envelope types. It distinguishes itself from siblings by focusing on orchestration rather than individual components like arifos_memory or arifos_sense. However, it doesn't explicitly contrast with other composite tools like agi_mind or apex_soul.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context through mentions of 'Session state at arifos://reply/context-pack' and 'F1/F13 triggers require human:arif ratification,' suggesting it's for session-based interactions with human oversight. However, it lacks explicit guidance on when to use this tool versus alternatives like agi_mind or arifos_forge, and doesn't specify prerequisites or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_repo_readARead-onlyIdempotentInspect
Check git status, diffs, and log with constitutional path whitelisting.
| Name | Required | Description | Default |
|---|---|---|---|
| path | No | ./ |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, destructiveHint=false, and openWorldHint=true, covering safety and scope. The description adds valuable context about 'constitutional path whitelisting' which suggests security/access restrictions beyond what annotations provide, though it doesn't detail rate limits or specific behavioral traits like output format.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core functionality ('Check git status, diffs, and log') and adds important constraint information ('with constitutional path whitelisting'). Every word earns its place with no redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has annotations covering read-only/non-destructive behavior and an output schema exists, the description provides adequate context about what the tool does and its security constraints. However, it could benefit from more explicit guidance on when to use it versus sibling tools for a more complete picture.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage for the single parameter 'path', the description doesn't add any parameter-specific information beyond what's implied by 'constitutional path whitelisting'. The baseline is 3 since schema coverage is low but the tool has only one parameter with a default value, making it manageable.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('Check git status, diffs, and log') and identifies the resource (git repository). It distinguishes from siblings like 'arifos_repo_seal' by focusing on read operations, but doesn't explicitly contrast with other repo tools like 'arifos_fetch' or 'arifos_forge'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage for git repository inspection with 'constitutional path whitelisting' suggesting security constraints, but doesn't explicitly state when to use this tool versus alternatives like 'arifos_repo_seal' or other git-related siblings. No clear exclusions or prerequisites are provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_repo_sealADestructiveInspect
Add and commit changes to the repository. REQUIRES F13 human ratification. Enforces F11 audit logging of all substrate mutations.
| Name | Required | Description | Default |
|---|---|---|---|
| files | No | ||
| message | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds behavioral context beyond annotations: it specifies a requirement for human ratification (F13) and mentions audit logging (F11), which are not covered by the annotations. The annotations (readOnlyHint: true, destructiveHint: false) indicate it's safe and non-destructive, and the description does not contradict these, but it lacks details on rate limits or error handling.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is highly concise and front-loaded, consisting of two sentences that efficiently convey the tool's purpose, requirements, and behavioral traits without any wasted words. Every sentence earns its place by providing critical information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (involves human ratification and audit logging), annotations cover safety aspects, and an output schema exists, the description is reasonably complete. It explains key behavioral constraints but could benefit from more details on prerequisites or error scenarios, though the output schema reduces the need for return value explanations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The description does not mention any parameters, and with schema description coverage at 0%, the input schema provides minimal semantic information (only types and constraints). The description does not compensate by explaining what 'files' or 'message' represent, so it adds no value beyond the schema, resulting in a baseline score of 3 due to the schema's basic coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('Add and commit changes') and resource ('repository'), distinguishing it from sibling tools like 'arifos_repo_read' which likely only reads. It goes beyond just restating the name/title by specifying the action and target.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for usage by stating 'REQUIRES F13 human ratification,' indicating when this tool should be used (with human approval). However, it does not explicitly mention when not to use it or name alternatives among the sibling tools, such as 'arifos_repo_read' for read-only operations.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_senseARead-onlyIdempotentInspect
Ground query in physical reality via the 8-stage constitutional sensing protocol: PARSE → CLASSIFY → DECIDE → PLAN → RETRIEVE → NORMALIZE → GATE → HANDOFF. Live web search is gated by truth classification — invariants use offline reasoning; time-sensitive facts trigger live retrieval; ambiguous queries HOLD for narrowing.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | 'governed' = full 8-stage constitutional protocol (recommended). Legacy modes: 'search' (raw), 'ingest' (URL fetch), 'compass' (auto-detect), 'atlas' (discovery), 'time' (clock grounding). | governed |
| query | Yes | Query to classify and ground in reality | |
| intent | No | Optional user intent hint | |
| dry_run | No | False = execute live retrieval; True = plan only (no HTTP calls) | |
| session_id | No | ||
| query_frame | No | Optional: {domain, time_scope, jurisdiction} |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full disclosure burden. It effectively explains the conditional behavior: live web search is gated by truth classification, ambiguous queries enter HOLD state, and it distinguishes between offline reasoning vs live retrieval. Does not mention authentication requirements or rate limits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two dense sentences with zero waste. First sentence establishes the protocol and purpose; second sentence explains the gating logic and conditional behavior. Information is front-loaded and every clause earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the existence of an output schema, the description appropriately focuses on behavioral logic rather than return values. For a 6-parameter tool with nested objects, it adequately explains the core 8-stage protocol and gating mechanism. Could mention side effects or persistence behavior to achieve completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 83% (high), establishing baseline 3. The description adds context that 'live retrieval' relates to truth classification (connecting to dry_run and mode parameters), but does not significantly expand on syntax, format, or semantics of query_frame, session_id, or intent beyond the schema definitions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States specific action ('Ground query in physical reality') and unique mechanism ('8-stage constitutional sensing protocol'). The named pipeline (PARSE→CLASSIFY→DECIDE→PLAN→RETRIEVE→NORMALIZE→GATE→HANDOFF) clearly distinguishes this from siblings like arifos_forge, arifos_judge, or arifos_route.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides implied usage through gating logic ('invariants use offline reasoning; time-sensitive facts trigger live retrieval; ambiguous queries HOLD'), clarifying when live retrieval activates. However, lacks explicit comparison to sibling tools (e.g., when to use arifos_route vs arifos_sense) or clear 'when-not-to-use' guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_vaultBInspect
Append immutable verdict record to Merkle-hashed ledger.
| Name | Required | Description | Default |
|---|---|---|---|
| verdict | Yes | Verdict to log | |
| evidence | No | Evidence summary | |
| session_id | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses key behavioral traits not in schema: 'immutable' (irreversible) and 'Merkle-hashed' (cryptographic integrity). However, as a write operation with no annotations, it lacks disclosure on failure modes, duplicate handling, or authorization requirements. Mention of 'immutable' partially covers safety profile.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence, seven words with zero waste. Critical modifiers ('immutable', 'Merkle-hashed') front-loaded. Every term adds technical specificity essential for understanding the tool's guarantees.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Acceptable for a write operation with output schema present (handling return values) and partial schema coverage. However, gaps remain regarding workflow integration with siblings and error handling for immutable writes.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 67% (verdict and evidence described; session_id undocumented). Description mentions 'verdict' reinforcing the required parameter, but does not compensate for the missing session_id description or explain parameter relationships (e.g., evidence format requirements). Baseline 3 appropriate for moderate schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clear specific action ('Append') and resource ('verdict record') with technical specificity ('Merkle-hashed ledger'). However, it does not distinguish from sibling tools like 'arifos.judge' (which likely creates verdicts) or clarify where this fits in the workflow.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance provided on when to use this tool versus alternatives, prerequisites for appending (e.g., existing session), or sequencing (e.g., whether to call after judgment). Simply states the mechanical action without usage context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_wisdomBRead-onlyIdempotentInspect
Returns a curated philosophical or cultural quote mapped to a constitutional surface. Sources include the 27-zone philosophy atlas, constitutional tool quotes, and arifOS forged canon.
| Name | Required | Description | Default |
|---|---|---|---|
| tone | No | Optional tone filter | |
| surface | No | Constitutional surface to retrieve a quote for | anchor |
| verdict | No | Optional verdict context for targeted quote selection | |
| language | No | Optional language preference | |
| risk_tier | No | Optional risk tier context | |
| shadow_profile | No | Optional shadow profile for dramaturgic selection |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, destructiveHint=false, idempotentHint=true, and openWorldHint=false, covering safety and idempotency. The description adds value by mentioning 'curated' quotes and source types ('27-zone philosophy atlas', 'constitutional tool quotes', 'arifOS forged canon'), which hints at behavioral traits like curation and canonical sourcing. However, it doesn't detail rate limits, auth needs, or specific curation logic, so it earns a baseline 3 for supplementing annotations without rich behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose and lists sources without waste. It's appropriately sized for the tool's complexity, though it could be slightly more structured (e.g., separating purpose from source details). No redundant or verbose elements, earning a 4 for near-optimal conciseness.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (6 optional parameters, enums, annotations, and an output schema), the description is reasonably complete. It states the purpose and sources, and with annotations covering safety and an output schema handling return values, it doesn't need extensive behavioral or output details. However, it lacks usage context relative to siblings, slightly reducing completeness to 4.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with all 6 parameters well-documented in the input schema (e.g., 'tone' as 'Optional tone filter'). The description adds no parameter-specific semantics beyond what the schema provides, such as explaining how 'surface' relates to 'constitutional surface' or interactions between parameters. With high schema coverage, the baseline is 3, as the description doesn't compensate but doesn't need to heavily.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Returns a curated philosophical or cultural quote mapped to a constitutional surface.' It specifies the verb ('Returns'), resource ('quote'), and mapping target ('constitutional surface'), and lists source types. However, it doesn't explicitly differentiate from sibling tools like 'arifos_wisdom_stats' or other quote-related tools, keeping it at 4 rather than 5.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It mentions sources but doesn't specify scenarios, prerequisites, or exclusions relative to siblings. This leaves the agent without contextual usage direction, scoring 2 for minimal implied usage from the purpose statement alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
arifos_wisdom_statsARead-onlyIdempotentInspect
Returns total quotes, coverage by surface/category/language/polarity, shadow index metrics, contrast pairs, and sample IDs for each surface.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare read-only, non-destructive, idempotent, and closed-world behavior. The description adds value by specifying the types of metrics returned (e.g., shadow index, contrast pairs), which helps the agent understand the output structure, but doesn't provide additional behavioral context like rate limits or authentication needs beyond what annotations cover.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, dense sentence that efficiently lists all returned metrics without wasted words. It's front-loaded with the core purpose ('Returns total quotes...') and structures the output components clearly, making every part of the sentence contribute to understanding.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has 0 parameters, rich annotations, and an output schema, the description is reasonably complete. It outlines the statistical metrics returned, which complements the structured data. However, it could be more complete by briefly hinting at use cases or sibling tool relationships, though the output schema likely covers return values in detail.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0 parameters and 100% schema description coverage, the input schema fully documents the absence of parameters. The description doesn't need to add parameter information, so it appropriately focuses on output semantics, earning a baseline score of 4 for not introducing unnecessary details.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool returns comprehensive statistics about quotes, including totals, coverage metrics, shadow index, contrast pairs, and sample IDs. It specifies the resource ('quotes') and scope ('for each surface'), but doesn't explicitly differentiate from sibling tools like 'arifos_wisdom' or 'arifos_fetch' that might handle similar data.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. With many sibling tools (e.g., 'arifos_wisdom', 'arifos_fetch', 'arifos_probe'), there's no indication of context, prerequisites, or exclusions for selecting this statistical tool over others.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
asi_heartCInspect
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | critique | |
| debug | No | ||
| query | No | ||
| content | No | ||
| dry_run | No | ||
| platform | No | unknown | |
| risk_tier | No | medium | |
| session_id | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full disclosure burden. While it mentions 'safety' and 'consequence modeling', it fails to explain the critical safety behavior implied by schema parameters: dry_run defaults to true, allow_execution defaults to false, and 'floors' must pass for execution. The description doesn't clarify side effects, persistence, or the meaning of 'F11/F13' sovereignty references in the schema.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is brief (two sentences), but the '666_HEART:' prefix is opaque jargon that fails the 'every sentence earns its place' standard. The structure front-loads the cryptic codename rather than the function, though the remaining content is efficiently stated.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex tool with 10 parameters including nested objects, safety controls (risk_tier, allow_execution), and authentication contexts (F11/F13 references), the description is inadequate. With no output schema and no annotations, the description should explain the execution safety model, mode behaviors, and expected outputs, but it provides only high-level domain keywords.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing a baseline of 3. The description adds value by explicitly listing the valid mode values ('critique', 'simulate') and providing the conceptual framework ('Safety, empathy...') for constructing the payload content, though it doesn't elaborate on syntax or relationships between risk_tier, dry_run, and allow_execution.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description identifies the domain (safety, empathy, consequence modeling) and lists available modes ('critique', 'simulate'), but the cryptic '666_HEART' prefix wastes space without adding clarity. It fails to distinguish this tool from esoteric siblings like 'agi_mind' or 'apex_soul' or explain what differentiates the two modes.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description enumerates the two modes but provides no guidance on when to use 'critique' versus 'simulate', nor does it indicate prerequisites or when to prefer this tool over siblings. No 'when not to use' guidance is provided despite the presence of execution controls.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
code_engineCInspect
Delegated Execution Bridge — The 10th Tool.
This tool does NOT execute directly. It: 1. Validates judge verdict is SEAL 2. Constructs signed execution manifest 3. Dispatches to AF-FORGE substrate 4. Returns execution receipt
| Name | Required | Description | Default |
|---|---|---|---|
| action | Yes | Execution type ("shell", "api_call", "contract", "compute") | |
| dry_run | No | If True, generate manifest but don't dispatch | |
| payload | Yes | Action-specific parameters | |
| platform | No | unknown | |
| session_id | Yes | Source session ID | |
| constraints | No | Resource limits for execution | |
| ttl_seconds | No | Manifest validity window | |
| judge_g_star | Yes | G★ score at time of verdict | |
| judge_verdict | Yes | Must be "SEAL" (from arifos.judge) | |
| af_forge_endpoint | No | Target substrate (default from config) |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description fails to disclose critical behavioral traits evident in the schema: that execution requires 'allow_execution' and passing 'floors', that 'dry_run' implies destructive capabilities exist, or what 'F11/F13' sovereignty contexts mean. The safety model and auth requirements are completely undocumented.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely brief (two sentences) with no redundant text. However, it may be overly terse given the tool's complexity—every sentence earns its place, but there are too few sentences to convey necessary context.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 10 parameters including nested objects, execution controls, risk tiers, and authentication contexts, the description is insufficient. It omits output format, safety guarantees, mode-specific behaviors, and the meaning of cryptic schema references (F11/F13), leaving significant gaps despite the lack of an output schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing a baseline of 3. The description enumerates the mode values (already present in the enum) but adds no semantic detail about what 'payload' should contain for each mode, how 'risk_tier' affects behavior, or the relationship between 'dry_run' and 'allow_execution'.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description identifies the domain ('System-level') and broad functions ('hygiene and observation'), but 'hygiene' is jargon without clarification (does it mean cleanup, validation, or security scanning?). While it lists the five available modes, it does not define what each mode does or how they differ from sibling tools like 'fetch_tool' or 'search_tool'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
There is no guidance on when to select this tool versus alternatives, nor when to use specific modes (e.g., when to use 'tail' vs 'replay'). The description fails to mention prerequisites such as the need to set 'allow_execution' for mutations or the purpose of 'risk_tier' selections.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
engineering_memoryCInspect
Retrieve governed memory from vector store or update the continuous world model.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | vector_query | |
| debug | No | ||
| query | No | ||
| content | No | ||
| dry_run | No | ||
| platform | No | unknown | |
| risk_tier | No | medium | |
| session_id | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'Constitutional: F10/F2 verification' without explaining what these codes mean or what safety constraints they impose. It does not clarify side effects, persistence guarantees, or the execution model implied by the 'dry_run' and 'allow_execution' parameters.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is compact and front-loads the mode list, which is efficient. However, it includes opaque internal codes ('555_MEMORY', 'F10/F2') that consume space without aiding agent comprehension, reducing the information density of the limited text provided.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity—10 parameters, 6 distinct modes, nested objects, and no output schema—the description is insufficient. It omits expected return values, error handling patterns, the interaction between 'mode' and 'payload' subfields, and the significance of the 'auth_context' parameter for a tool claiming 'sovereignty' and governance features.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing a baseline of 3. The description adds value by mapping mode enum values to their intended actions (e.g., 'engineer' = execute, 'vector_query' = search), which clarifies the payload structure somewhat. However, it does not explain the relationship between specific payload fields and modes, or the semantics of 'risk_tier' and 'auth_context'.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description identifies the tool as a 'governed autonomous engineering and vector memory' system and lists six operational modes, giving a general sense of functionality. However, it fails to clearly define what resource is being manipulated or how it differs from siblings like 'code_engine' or 'search_tool', leaving the core purpose somewhat nebulous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
While the description enumerates available modes (engineer, vector_query, etc.), it provides no guidance on when to select each mode or when to prefer this tool over sibling tools like 'code_engine' or 'search_tool'. There is no mention of prerequisites, constraints, or alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
forge_surfaceForge — Double-Gated ExecutionCInspect
Open the arifOS Forge Execution Surface with double-gate architecture. Gate 1: 888_JUDGE must SEAL. Gate 2: Human must APPROVE. F13 Sovereign Veto: no machine may cross this line alone.
| Name | Required | Description | Default |
|---|---|---|---|
| risk_tier | No | medium | |
| candidate_action | No | describe the action to forge |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes critical behavioral traits: the double-gate process (JUDGE seal and human approval) and the 'F13 Sovereign Veto' principle, which implies this is a high-risk, non-autonomous operation requiring human oversight. However, it lacks details on rate limits, error handling, or response format.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized (three sentences) and front-loaded with the core action. Each sentence adds value: the first states the purpose, the second explains the gates, and the third emphasizes the veto principle. There's minimal waste, though some phrasing is cryptic (e.g., 'F13 Sovereign Veto').
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity implied by the description (high-risk execution surface), no annotations, 0% schema coverage, two parameters, and no output schema, the description is incomplete. It misses critical details: what the tool returns, how parameters are used, error conditions, and concrete examples of when to invoke it. The cryptic terminology further reduces clarity.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, so the description must compensate for the two undocumented parameters ('risk_tier' and 'candidate_action'). The description provides no information about these parameters—it doesn't explain what 'risk_tier' values mean, what 'candidate_action' should contain, or how they influence the tool's behavior. This leaves parameters largely unexplained.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the tool 'Open[s] the arifOS Forge Execution Surface with double-gate architecture', which provides a specific verb ('Open') and resource ('arifOS Forge Execution Surface'), but it's vague about what this surface actually does or enables. It doesn't clearly distinguish from sibling tools like 'arifos_forge' or 'init_surface', leaving the core functionality ambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description mentions a 'double-gate architecture' requiring JUDGE sealing and human approval, implying this tool should be used for high-stakes operations, but it doesn't explicitly state when to use it versus alternatives (e.g., 'request_approval' or other forge-related tools). No guidance on prerequisites, exclusions, or specific contexts is provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
init_anchorCInspect
Initialize constitutional session OR perform kernel syscall.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | init | |
| debug | No | ||
| query | No | ||
| intent | No | ||
| context | No | ||
| dry_run | No | ||
| payload | No | ||
| actor_id | No | ||
| platform | No | unknown | |
| risk_tier | No | medium | |
| call_graph | No | ||
| session_id | No | ||
| current_tool | No | ||
| actual_output | No | ||
| declared_name | No | ||
| requested_tool | No | ||
| allow_execution | No | ||
| observed_effects | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Without annotations, the description must carry full behavioral disclosure burden but only partially succeeds. While it indicates 'revoke' kills sessions and 'refresh' rotates tokens, it fails to disclose critical safety information such as recovery options for revoked sessions, the irreversibility of actions, or dry_run behavior implications.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description conveys mode options in a single sentence, but wastes space on metaphorical fluff ('🔥 THE IGNITION STATE') and emphatic capitalization that does not enhance understanding. The structure front-loads jargon rather than functional clarity, reducing effective information density.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity—handling session lifecycle, identity establishment, and token rotation with 10 parameters and nested objects—the description is insufficient. It lacks explanation of mode-specific payload requirements, authentication prerequisites, or output behavior, leaving critical gaps for a tool managing session state.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage per context signals, the input schema comprehensively documents parameters including nested payload objects and enum values. The description adds minimal parameter-specific context beyond listing the available modes, meeting the baseline expectation when schema documentation is thorough.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description identifies the tool manages 'constitutional session operations' through five specific modes (init, state, status, revoke, refresh), providing basic functional scope. However, heavy metaphorical language ('IGNITION STATE OF INTELLIGENCE') and lack of differentiation from similarly-named siblings (agi_mind, apex_soul, arifOS_kernel) reduce clarity on when this specific tool is appropriate.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description maps each mode to its function (e.g., 'revoke' for kill session, 'refresh' for rotate token), implying when to use specific modes. However, it lacks explicit guidance on when to use this tool versus sibling alternatives and does not specify prerequisites or conditions that would prevent usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
init_surface000 Session AnchorBInspect
Open the arifOS Session Anchoring Surface. Declares intent, selects mode, and anchors the constitutional session. F1 Amanah — session creation is irreversible commitment.
| Name | Required | Description | Default |
|---|---|---|---|
| declared_intent | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, the description carries the full burden. It discloses that 'session creation is irreversible commitment,' indicating a destructive or permanent action, and mentions 'F1 Amanah' (possibly a mode or trust level), adding valuable behavioral context beyond basic function.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is brief and front-loaded, with three sentences that each add value: opening the surface, core actions, and irreversible commitment. No wasted words, though it could be slightly more structured for clarity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (session anchoring with irreversible commitment), no annotations, no output schema, and low schema coverage, the description is incomplete. It covers purpose and key behavior but lacks details on modes, consequences, or return values, making it adequate but with gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0% with 1 parameter, so the description must compensate. It explains that the tool 'Declares intent,' which aligns with the 'declared_intent' parameter, adding meaning beyond the bare schema. However, it doesn't detail format or constraints for the intent.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the tool 'Open[s] the arifOS Session Anchoring Surface' and 'Declares intent, selects mode, and anchors the constitutional session,' which provides a general purpose but lacks specificity on what 'anchors' means or what resources are involved. It distinguishes from siblings like 'init_anchor' by focusing on 'Session Anchoring Surface,' but the purpose remains somewhat vague and abstract.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit guidance on when to use this tool versus alternatives is provided. The description implies it's for session initiation but doesn't specify prerequisites, timing, or contrast with siblings like 'arifos_init' or 'init_anchor,' leaving usage unclear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
judge_surface888 Constitutional JudgeBInspect
Open the arifOS Constitutional Verdict Surface. Evaluates a candidate action against all 13 constitutional floors. Human SEAL or REJECT required before any forge execution (F13 Sovereign).
| Name | Required | Description | Default |
|---|---|---|---|
| risk_tier | No | medium | |
| candidate_action | No | describe the action to evaluate |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It adds useful context: it's an evaluation tool that requires human approval ('Human SEAL or REJECT required') before execution, which implies it's a safety check with no direct execution. However, it doesn't detail other traits like rate limits, error handling, or what 'Open the surface' entails behaviorally, leaving gaps for a tool with no annotation coverage.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized with three sentences that are front-loaded: the first states the core purpose, the second adds scope, and the third provides critical usage context. There's minimal waste, though the phrasing could be slightly more streamlined (e.g., 'Open the surface' is vague).
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (evaluating actions against constitutional floors), no annotations, no output schema, and 0% schema coverage, the description is incomplete. It covers purpose and high-level usage but lacks details on parameters, return values, error cases, or how the evaluation results are presented. For a safety-critical tool, this leaves significant gaps in understanding.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0% for 2 parameters, so the description must compensate. It mentions 'candidate_action' implicitly ('Evaluates a candidate action'), adding some meaning, but doesn't explain 'risk_tier' or provide details on parameter formats, constraints, or interactions. This is insufficient given the low schema coverage, as it leaves parameters largely undocumented.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Open the arifOS Constitutional Verdict Surface. Evaluates a candidate action against all 13 constitutional floors.' It specifies the verb ('Open', 'Evaluates'), resource ('arifOS Constitutional Verdict Surface'), and scope ('against all 13 constitutional floors'). However, it doesn't explicitly differentiate from sibling tools like 'arifos_judge' or 'forge_surface', which appear related, so it misses full sibling distinction.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool: it's for evaluating actions against constitutional floors, with 'Human SEAL or REJECT required before any forge execution' implying it's a pre-execution check. It hints at an alternative ('forge execution') but doesn't explicitly name when-not-to-use or list specific alternatives like 'arifos_forge' from siblings, so it's not fully explicit.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
math_estimatorCInspect
Calculate operation costs and thermodynamics.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | cost | |
| debug | No | ||
| query | No | ||
| action | No | ||
| dry_run | No | ||
| platform | No | unknown | |
| risk_tier | No | medium | |
| session_id | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It fails to explain the execution model (dry_run vs allow_execution interaction), safety implications of 'risk_tier', or whether operations are idempotent. The description only lists mode names without behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely compact (two sentences) with no redundant or wasted language. However, given the tool's complexity (10 parameters including execution controls), this brevity may be inappropriate rather than optimally concise.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 10-parameter tool with execution semantics (dry_run, allow_execution, risk_tier), nested objects, and no output schema or annotations, the two-sentence description is inadequate. It omits critical context about execution flow, return values, and mode-specific requirements.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing a baseline of 3. The description adds no semantic value beyond the schema—it merely repeats the enum values for 'mode' without clarifying what 'entropy' or 'cost' calculate, nor does it explain the nested payload structure or F11/F13 references in auth_context.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description identifies a domain ('health and metabolic') and operation ('estimation'), but 'estimation' remains vague (calculation? prediction?). It lists available modes without explaining what distinguishes them or how they relate to the health/metabolic domain, failing to differentiate from siblings like 'physics_reality' or 'asi_heart'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to select specific modes ('cost' vs 'health' vs 'vitals'), nor when to use this tool versus siblings. The critical execution parameters (dry_run defaulting to true, allow_execution requirement) are left undocumented in the description.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
monitor_metabolismCInspect
Open the arifOS Metabolic Monitor — a real-time dashboard showing the health of all 13 Constitutional Floors (F1-F13), plus thermodynamic metrics: ΔS (entropy change), Peace² (stability), and Ω₀ (baseline).
| Name | Required | Description | Default |
|---|---|---|---|
| session_id | No | global |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It states this is a 'real-time dashboard' which implies a read-only monitoring function, but doesn't clarify if it requires specific permissions, has side effects, involves data retrieval costs, or how the dashboard is presented (e.g., visual vs. data). For a tool with zero annotation coverage, this leaves significant behavioral gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured in a single sentence that front-loads the core action ('Open the arifOS Metabolic Monitor') and then elaborates on what it shows. Every part adds value without redundancy, making it appropriately concise for its purpose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity implied by monitoring '13 Constitutional Floors' and thermodynamic metrics, plus no annotations and no output schema, the description is incomplete. It doesn't explain the format of returned data, how to interpret metrics like 'Peace²', or any operational constraints. For a monitoring tool with rich domain-specific terms, more context is needed for effective use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 1 parameter (session_id) with 0% description coverage in the schema. The tool description doesn't mention any parameters or explain what 'session_id' means or when to use 'global' vs. other values. With low schema coverage, the description fails to compensate by adding parameter meaning, but since there's only one optional parameter, the baseline is adjusted to 3 as the impact is minimal.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Open the arifOS Metabolic Monitor' which is a specific action (open) on a specific resource (Metabolic Monitor). It distinguishes what the monitor displays (health of Constitutional Floors and thermodynamic metrics), making the purpose clear. However, it doesn't explicitly differentiate from sibling tools like 'arifos_health' or 'arifos_probe', which might have overlapping monitoring functions.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention any prerequisites, context for usage, or comparisons to sibling tools like 'arifos_health' or 'arifos_probe'. The agent must infer usage based solely on the purpose statement without explicit when/when-not instructions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
physics_realityCInspect
arifos_sense — Constitutional Reality Sensing
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | governed | |
| actor | No | ||
| debug | No | ||
| query | Yes | ||
| budget | No | ||
| intent | No | ||
| policy | No | ||
| dry_run | No | ||
| platform | No | unknown | |
| risk_tier | No | medium | |
| session_id | No | ||
| query_frame | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description fails to explain critical behavioral aspects: what 'floors' means in allow_execution, what differentiates 'compass' from 'atlas' mode, how risk_tier affects execution, or whether 'ingest' is destructive. The 'F11' and 'F13' references in auth_context are unexplained.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is brief (one sentence), but the '111_SENSE:' prefix consumes valuable front-loaded space without conveying actionable information. The parenthetical explanation of the time mode is appropriately placed.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (10 parameters, nested payload object, execution controls like dry_run and allow_execution, and no output schema), the description is insufficient. It omits return value descriptions, workflow patterns, and the meaning of execution 'floors' referenced in parameters.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100%, establishing a baseline of 3. The description adds specific semantic value for the 'time' mode (explaining it returns UTC+KL datetime, weekday, quarter), but offers no clarification for the other four modes or complex parameters like payload sub-fields.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description provides some specific verbs ('acquisition', 'mapping') and lists available modes, but 'Earth-Witness' and '111_SENSE' are opaque jargon that don't clearly identify the resource or distinguish this tool from siblings like 'search_tool' or 'fetch_tool'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description enumerates the five available modes but provides no guidance on when to use this tool versus sibling alternatives (search_tool, fetch_tool, math_estimator) or when to select specific modes over others.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
request_approvalAInspect
Request human approval before proceeding with an action.
Call this tool proactively whenever you are about to take a significant or irreversible action and want the user to confirm first. Do NOT wait for the user to ask you to seek approval — use your judgment about when confirmation is appropriate.
The user will see an approval card with the summary, optional details, and Approve/Reject buttons. When they click a button, their decision appears as a message in the conversation (as if the user typed it), like:
"Deploy v3.2 to production" — I selected: Approveor:
"Deploy v3.2 to production" — I selected: RejectIMPORTANT: After calling this tool, you MUST stop and wait for the user's response. Do not continue, do not take any other actions, do not generate further output until you see the "I selected:" message. If approved, continue with the action. If rejected, acknowledge and ask how to proceed.
| Name | Required | Description | Default |
|---|---|---|---|
| title | No | Heading for the approval card (default: "Approval Required"). | |
| details | No | Optional longer explanation, context, or consequences of the action. | |
| summary | Yes | Brief description of the action requiring approval (shown prominently to the user). | |
| reject_text | No | Label for the reject button (default: "Reject"). | |
| approve_text | No | Label for the approve button (default: "Approve"). | |
| reject_variant | No | Button style for the reject button (same options plus "outline"). | |
| approve_variant | No | Button style — "default", "destructive", "success", or "info". |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It thoroughly describes the tool's behavior: it creates an approval card with buttons, waits for user response, shows how the response appears in conversation ('I selected: Approve/Reject'), and provides post-call instructions (stop and wait, continue if approved, acknowledge if rejected). This covers interaction flow, user experience, and agent behavior requirements.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured with clear sections: purpose statement, usage guidelines, behavioral explanation, and parameter documentation. While comprehensive, some sentences could be more concise (e.g., the user response example could be simplified). However, every sentence adds value, and the information is front-loaded with the most critical guidance first.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of an interactive approval tool with 7 parameters, no annotations, and no output schema, the description provides complete context. It explains the tool's purpose, when to use it, the full interaction flow, parameter meanings, and post-call behavior. For a tool that requires precise agent coordination with human input, this description leaves no significant gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage and 7 parameters, the description fully compensates by explaining each parameter's purpose and usage. It defines 'summary' as 'Brief description of the action requiring approval', 'details' as 'Optional longer explanation, context, or consequences', and provides clear semantics for all other parameters including defaults and button styling options. This adds substantial value beyond the bare schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states the tool's purpose: 'Request human approval before proceeding with an action.' It specifies the verb ('request approval') and the resource/context ('human approval for actions'), clearly distinguishing it from sibling tools that appear to be unrelated system operations (e.g., arifos_fetch, vault_ledger).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool: 'Call this tool proactively whenever you are about to take a significant or irreversible action and want the user to confirm first.' It also specifies when NOT to use it ('Do NOT wait for the user to ask you to seek approval') and offers judgment criteria ('use your judgment about when confirmation is appropriate').
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
vault_ledgerCInspect
Append immutable verdict to ledger.
| Name | Required | Description | Default |
|---|---|---|---|
| debug | No | ||
| dry_run | No | ||
| verdict | Yes | ||
| evidence | No | ||
| platform | No | unknown | |
| risk_tier | No | medium | |
| session_id | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| ok | No | |
| code | No | Typed domain error code: INIT_AUTH_401 | INIT_POLICY_403 | INIT_SCHEMA_422 | INIT_DEPENDENCY_503 | INIT_KERNEL_500 | INIT_TRANSPORT_503 |
| hint | No | Actionable operator guidance for resolving this state |
| meta | No | |
| mode | No | Mode dispatched: init | refresh | revoke | state |
| tool | Yes | |
| debug | No | |
| stage | Yes | |
| state | No | Canonical continuity state shared across tools. |
| trace | No | |
| detail | No | Technical root-cause detail — operator-facing, not shown to end user |
| errors | No | |
| intent | No | Declared operator intent for this call |
| policy | No | Structured policy result: {floors_checked, floors_failed, injection_score, witness_required}. Tells callers whether INIT failed before or after constitutional checks. |
| system | No | System health snapshot: {kernel_version, adapter, env, dependency_health} |
| handoff | No | Formal handoff contract for downstream tools. |
| metrics | No | Unified arifOS Telemetry (Score Integrity Protocol). |
| payload | No | |
| verdict | No | Constitutional verdict outcomes. Only these 7 canonical verdicts exist system-wide: - SEAL (non-terminal): stage successful - PROVISIONAL (non-terminal): exploratory result - PARTIAL (non-terminal): incomplete but usable - SABAR (non-terminal): pause / needs more context - HOLD (non-terminal): waiting for authority/human - HOLD_888 (non-terminal): specific high-stakes human gating - VOID (TERMINAL): hard rejection / invalid state — must be extremely rare Normalization rule (enforced by verdict_contract.normalize_verdict): if stage < 888 and verdict == VOID: verdict = SABAR |
| version | No | |
| identity | No | |
| trace_id | No | Distributed trace identifier |
| authority | No | |
| retryable | No | Whether the caller should retry after SABAR cooldown |
| timestamp | No | ISO-8601 UTC timestamp of envelope creation |
| continuity | No | |
| philosophy | No | Optional governed quote layer selected by APEX-G. |
| risk_class | No | |
| sabar_step | No | |
| session_id | No | |
| user_model | No | Bounded user model built from explicit asks and observable constraints only. Psychological inference is disallowed by policy. |
| diagnostics | No | Tagged diagnostics for hard guardrails, advisory signals, and symbolic metrics. |
| duration_ms | No | Round-trip duration in milliseconds |
| next_action | No | |
| recoverable | No | |
| transitions | No | Explicit state transitions since the prior tool call. |
| anchor_scope | No | Session dependency tier: stateless | session | elevated_session |
| anchor_state | No | Lifecycle of this anchor: created | reused | resumed | denied |
| auth_context | No | |
| caller_state | No | |
| state_origin | No | Origin metadata for the canonical continuity state. |
| blocked_tools | No | |
| machine_issue | No | |
| requires_auth | No | |
| verdict_scope | No | F2 constitutional verdict scope tag. Routing/domain/session/dry_run/cannot_compute. |
| artifact_state | No | State of the output artifact (Audit Fix 4). |
| caller_context | No | AI execution identity. Auto-populated by MCP server. |
| machine_status | No | |
| requires_human | No | |
| verdict_detail | No | Structured v1.0 details. |
| degraded_reason | No | Typed degradation cause: kernel_unavailable | authority_unverified | policy_blocked | dependency_timeout |
| contract_version | No | Cross-tool continuity contract version. |
| diagnostics_only | No | |
| execution_status | No | Mechanical status of the tool execution (Fix 2). |
| operator_summary | No | Compact operator-facing truth summary. |
| platform_context | No | Caller platform surface: chatgpt|perplexity|mcp-cli|playground|api|unknown. F1-safe: defaults to None (unknown). |
| primary_artifact | No | |
| state_transition | No | |
| governance_status | No | Constitutional verdict (Fix 2, Audit Critical Fix 3). |
| allowed_next_tools | No | |
| intelligence_stage | No | |
| intelligence_state | No | |
| next_allowed_modes | No | Modes this actor may invoke next (e.g. ['query', 'reflect']) |
| rollback_available | No | Whether a rollback / undo path exists (F1 Amanah) |
| canonical_tool_name | No | |
| continuation_status | No | Orchestration direction (Fix 3). |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden but only hints at immutability via 'permanent.' It fails to disclose critical behavioral traits implied by the schema: the risk tier system, the conditional execution model (allow_execution, dry_run), what 'verify' actually validates, or the irreversible nature of 'seal' operations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely brief at two sentences, but the '999_VAULT:' prefix appears to be metadata or versioning noise that wastes valuable description real estate without adding semantic value. Otherwise efficiently structured with the core purpose stated first.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex 10-parameter tool involving risk tiers, execution gates, and nested payload objects with 'permanent' implications, the description is insufficient. It lacks any indication of return values (no output schema exists) and omits safety-critical context about the irreversible nature of sealing decisions.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing a baseline of 3. The description adds value by specifying the two operational modes ('seal', 'verify') and implying the payload structure varies by mode, though it doesn't explain the relationship between mode selection and required payload fields.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States the tool handles 'permanent decision recording and integrity' with modes 'seal' and 'verify', providing a basic verb and resource. However, 'decision' remains abstract and the description fails to differentiate this from siblings like engineering_memory or architect_registry despite having distinct 'permanent' and audit-like characteristics.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides no guidance on when to use 'seal' versus 'verify' modes, no prerequisites for invocation, and no indication of when to prefer this over sibling storage tools. The cryptic '999_VAULT' prefix offers no actionable context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
vault_ledger_surface999 Vault LedgerAInspect
Open the arifOS Immutable Vault Ledger. Shows the live BLS constitutional seal card and all VAULT999 ledger entries. F1 Amanah: read-only — no edit, no delete.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively communicates key traits: the tool is read-only ('no edit, no delete'), displays live data, and shows specific components (seal card and ledger entries). It doesn't mention rate limits, authentication needs, or error handling, but covers the essential safety profile adequately for a zero-parameter tool.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise and well-structured: three short sentences that front-load the core action ('Open the arifOS Immutable Vault Ledger'), detail what it shows, and clarify behavioral constraints. Every sentence earns its place with no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (simple display function with zero parameters), no annotations, and no output schema, the description is reasonably complete. It explains what the tool does, what it shows, and its read-only nature. It could theoretically mention the output format or any visual/interface details, but for a zero-parameter tool, this is sufficient.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The tool has zero parameters with 100% schema description coverage, so the baseline is 4. The description doesn't need to explain parameters, and it doesn't attempt to—it correctly focuses on the tool's function rather than non-existent inputs.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('Open', 'Shows') and resources ('arifOS Immutable Vault Ledger', 'BLS constitutional seal card', 'VAULT999 ledger entries'). It distinguishes itself from siblings like 'vault_ledger' by specifying it's a 'surface' tool that displays live data, not just accessing the ledger.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool: to view the live vault ledger and seal card. It doesn't explicitly mention when not to use it or name alternatives, but the read-only nature and focus on display imply it's for inspection rather than modification, which helps differentiate from potential write-oriented siblings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!