arifOS — Constitutional AI Kernel

Name: arifOS — Constitutional AI Kernel
Author: ariffazil

by io.github.ariffazil

Server Details

Constitutional AI kernel with 13 MCP tools, 888_JUDGE verdict pipeline, and VAULT999 ledger.

Status: Healthy
Last Tested: 2026-07-22 12:41
Transport: Streamable HTTP
URL
Repository: ariffazil/arifOS
GitHub Stars: 0

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A3.8/5.0

Tool DescriptionsA

Average 3.7/5 across 8 of 8 tools scored. Lowest: 3.1/5.

Server CoherenceA

Disambiguation5/5

Each tool targets a distinct aspect of the constitutional AI kernel: session initiation (init), reasoning (think), verdict (judge), irreversible memory (seal), execution (forge), observation (observe), routing (route), and general memory (memory). Descriptions clearly differentiate responsibilities, minimizing overlap.

Naming Consistency4/5

Tool names follow the pattern 'arif_<verb>' consistently, except for 'arif_memory' which uses a noun instead of a verb. This minor deviation prevents a perfect score, but the overall pattern is clear and predictable.

Tool Count5/5

With 8 tools, the server is well-scoped for a constitutional AI kernel. Each tool serves a distinct and necessary function, and the count is within the ideal range (3-15) for manageable agent selection.

Completeness4/5

The tool set covers core lifecycle operations (init, think, judge, seal, forge, observe, route, memory). Minor gaps exist, such as the absence of an explicit 'critique' tool mentioned in arif_think, and implied bridge tools not provided, but the surface is largely complete for the stated purpose.

Available Tools

8 tools

arif_forge777 Forge · Execute GateA

Destructive

Inspect

KERNEL 777 · Execution gate via A-FORGE (hands, not law). Mutates only after arif_judge SEAL + lease/chain IDs — no self-authorize. Modes: dry_run | engineer | query | write | generate | commit | recall. Public execution verb (arif_act is internal alias only). Use when: a constitutional verdict (SEAL) has been obtained and the agent needs to execute a mutation — code changes, deployments, file writes, git operations.

ParametersJSON Schema

Name	Required	Description	Default
`mode`	No		engineer
`query`	No
`plan_id`	No
`actor_id`	No
`manifest`	No
`_envelope`	No
`session_id`	No
`arif_ack_id`	No
`artifact_id`	No
`session_token`	No	SCT from arif_init — continuity for ChatGPT multi-call path
`vault_entry_id`	No
`seal_verdict_id`	No
`ack_irreversible`	No
`judge_state_hash`	No
`approved_action_hash`	No
`constitutional_chain_id`	No

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate destructiveHint=true and readOnlyHint=false. Description adds that it mutates only after arif_judge SEAL + lease/chain IDs and that no self-authorize is allowed. This provides meaningful context beyond the annotations, though it does not detail what exactly gets destroyed or the full extent of mutations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short and front-loaded with the core concept. However, the heavy use of jargon ('KERNEL 777', 'A-FORGE', 'arif_judge SEAL') may reduce clarity for some agents. It earns its place without extraneous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of 15 parameters, no output schema, and no schema descriptions, the description is insufficient. It does not explain return values, error conditions, or parameter relationships. The tool is a mutation gate, but the description lacks completeness for safe and correct usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, yet the description only briefly mentions modes and the requirement for SEAL/lease/chain IDs. With 15 parameters, most are left undefined. The description fails to compensate for the lack of schema descriptions, leaving agents uncertain about how to fill parameters like query, plan_id, actor_id, etc.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it's an execution gate for mutations after a constitutional verdict (SEAL). It lists specific modes (dry_run, engineer, etc.) and distinguishes from sibling tools by noting that arif_act is an internal alias. The tool's purpose is highly specific and actionable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'when a constitutional verdict (SEAL) has been obtained and the agent needs to execute a mutation'. While it doesn't explicitly say when not to use or list alternatives, the context implies this tool is for post-verdict execution, which differentiates it from siblings like arif_judge or arif_seal.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

arif_init000 Init · Kernel SessionB

Idempotent

Inspect

KERNEL 000 · Session ignition. Binds actor, floors, and audit before any other arif_* verb can govern. Without session_id the kernel treats the caller as anonymous (OBSERVE_ONLY). Modes: ping | light | init | resume | validate | epoch_open | epoch_seal | canary | preflight | triage. Returns session_id, authority band, allowed_next_verbs. Not a helper plugin. Use when: starting a new session, resuming a session, checking kernel liveness, or running a preflight check before any governed action.

ParametersJSON Schema

Name	Required	Default
`mode`	No	init
`nonce`	No
`intent`	No
`context`	No
`payload`	No
`tooling`	No
`verbose`	No
`actor_id`	No
`epoch_id`	No
`evidence`	No
`trace_id`	No
`_envelope`	No
`verbosity`	No	standard
`session_id`	No
`agent_policy`	No
`counterparty`	No
`sovereign_id`	No
`session_token`	No
`actor_signature`	No
`caller_actor_id`	No
`delegation_mode`	No
`idempotency_key`	No
`ack_irreversible`	No
`executor_actor_id`	No
`declared_model_key`	No
`client_capabilities`	No
`requested_authority`	No	OBSERVE_ONLY
`previous_session_hash`	No

Tool Definition Quality

B3.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description elaborates that without a session_id the caller is anonymous (OBSERVE_ONLY) and mentions return values. Annotations already indicate non-destructive and idempotent nature; description adds context about authority band and allowed_next_verbs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately concise but includes a list of 11 modes and multiple sentences that could be streamlined. Some redundancy exists with the title and first sentence.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high complexity (27 parameters, no output schema, no parameter descriptions), the description falls short. It does not explain how modes affect parameter requirements, nor does it cover error conditions or return structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 27 parameters and 0% schema description coverage, the description only vaguely references 'mode' and 'session_id' but does not explain any other parameter's purpose or behavior. Agents lack guidance for most parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as 'Session ignition' for binding actor, floors, and audit before other arif_* verbs. It distinguishes from sibling tools as the initialization prerequisite, but does not explicitly differentiate from all siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit usage contexts: 'starting a new session, resuming a session, checking kernel liveness, or running a preflight check.' Also notes it's not a helper plugin. Lacks explicit when-not-to-use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

arif_judge888 Judge · VerdictB

Read-only

Inspect

KERNEL 888 · Constitutional verdict — only organ that SEAL/HOLD/SABAR/VOIDs. Not advice; binding floor + authority arbitration. Requires actor, intent, domain, reversibility_level, blast_radius. Authority: SOVEREIGN session for real adjudicate. Returns verdict + receipts + next_safe_action. Use when: a decision needs constitutional clearance — irreversible actions, high-blast-radius operations, or when the agent must know if an action is lawful.

ParametersJSON Schema

Name	Required	Description	Default
`mode`	No	Operation mode: intercept, judge, validate, hold, escalate	intercept
`actor`	No
`domain`	No
`intent`	No
`actor_id`	No
`evidence`	No
`_envelope`	No
`session_id`	No
`measurement`	No
`blast_radius`	No
`session_token`	No
`authority_token`	No
`epistemic_state`	No		UNKNOWN
`reversibility_level`	No
`requested_capability`	No

Tool Definition Quality

B3.1/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, but description states the tool can SEAL/HOLD/SABAR/VOIDs and returns binding verdicts, implying mutations. This contradiction undermines transparency. The description does not resolve the conflict.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is reasonably short but includes fluff ('KERNEL 666', 'SOVEREIGN session') that doesn't add clarity. Front-loads purpose but could be more efficient. Still, not overly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, so description should detail return values. Mentions 'verdict + receipts + next_safe_action' but lacks specifics. The annotation contradiction and incomplete parameter coverage reduce completeness for a complex judicial tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 14 parameters with 0% coverage. Description adds meaning for 5 (actor, intent, domain, reversibility_level, blast_radius) but incorrectly labels them as required. The other 9 parameters are unaddressed, leaving gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it provides constitutional verdicts and is the only organ that can SEAL/HOLD/SABAR/VOIDs, differentiating from siblings like arif_seal. The verb 'judge' is well-defined.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit 'Use when' guidance for irreversible actions, high blast radius, or lawful clearance. However, no explicit when-not-to-use or alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

arif_memoryMemory Governor · KernelB

Destructive

Inspect

KERNEL memory governor · L1–L6 under F1/F2/F4/F11 (not a free notepad). Modes: recall | inspect | attest | remember | promote | revise | forget. Writes are J-space mutations; arifOS judges, storage organs hold data. Use when: the agent needs to recall past context, store new knowledge, promote memories to higher tiers, or audit memory integrity.

ParametersJSON Schema

Name	Required	Default
`mode`	No	recall
`tier`	No
`query`	No
`scope`	No
`top_k`	No
`aspect`	No
`hybrid`	No
`policy`	No
`cascade`	No
`content`	No
`include`	No
`payload`	No
`seal_id`	No
`to_tier`	No
`actor_id`	No
`lease_id`	No
`metadata`	No
`trace_id`	No
`_envelope`	No
`from_tier`	No
`memory_id`	No
`tier_hint`	No
`timestamp`	No
`provenance`	No
`redact_pii`	No
`session_id`	No
`structured`	No
`graph_first`	No
`new_content`	No
`truth_class`	No
`caller_chain`	No
`future_value`	No
`memory_class`	No
`policy_basis`	No
`applicability`	No
`embedding_ref`	No
`include_proof`	No
`session_token`	No
`vault_version`	No
`human_approval`	No
`new_structured`	No
`temporal_as_of`	No
`tombstone_text`	No
`idempotency_key`	No
`new_truth_class`	No
`resolution_kind`	No
`source_receipts`	No
`correction_event`	No
`memory_authority`	No
`promotion_reason`	No
`include_contested`	No
`progressive_level`	No
`require_human_ack`	No
`time_window_hours`	No
`decision_lifecycle`	No
`organ_staleness_band`	No
`supersedes_memory_id`	No
`minimised_vault_record`	No
`required_floors_satisfied`	No

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructiveHint=true. The description adds that writes are 'J-space mutations' and that 'arifOS judges, storage organs hold data,' supplementing the annotation with behavioral context. However, it does not explain the nature of mutations, tier behavior, or what happens during each mode, leaving gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short (3 sentences) and front-loaded with key info, but includes jargon like 'J-space mutations' and 'storage organs' that may confuse agents. It could be more concise without losing meaning.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 59 optional parameters, no param descriptions, no output schema, and complex modes, the description is insufficient. It does not explain mode-specific behavior, required vs optional parameters, return values, or error conditions, leaving an agent with inadequate guidance for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 59 parameters and 0% schema coverage, the description fails to explain any parameter semantics. It lists modes and mentions tiers but does not describe how parameters like 'query', 'content', 'tier', or 'top_k' should be used, forcing the agent to infer meaning from parameter names alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a memory governor with enumerated modes (recall, inspect, attest, remember, promote, revise, forget) and specifies its scope as L1–L6 under F1/F2/F4/F11, distinguishing it from sibling tools like arif_think or arif_judge. However, the purpose could be more explicitly stated as managing agent memory with read/write operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use when: the agent needs to recall past context, store new knowledge, promote memories to higher tiers, or audit memory integrity,' providing clear context. It also warns it is 'not a free notepad,' but does not specify when not to use it or mention alternative tools for similar tasks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

arif_observe111 Observe · Sense RealityA

Read-onlyIdempotent

Inspect

KERNEL 111 · Sense reality into evidence (not reasoning, not judgment). Modes: search | fetch | ingest | vitals | compass | atlas | entropy_dS | repo_map | hybrid_discovery. Returns evidence with sources + uncertainty tags. Domain compute → arif_route to GEOX/WEALTH/WELL. Use when: the user needs factual evidence, web search, URL fetch, system vitals, or entropy measurement. For domain-specific computation (geology, capital, health), use arif_route instead.

ParametersJSON Schema

Name	Required	Default
`url`	No
`mode`	No	search
`query`	No
`layers`	No
`actor_id`	No
`_envelope`	No
`session_id`	No
`result_limit`	No
`session_token`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint, and destructiveHint. The description adds that the tool 'returns evidence with sources + uncertainty tags' and emphasizes 'not reasoning, not judgment', which provides useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four compact sentences front-load the core purpose, list modes, describe return format, and provide usage guidance. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the description covers purpose, modes, and high-level output, it lacks detail on the return structure (beyond 'evidence with sources + uncertainty tags') and does not explain how parameters like layers or session_id affect behavior. Given 9 params and no output schema, more completeness is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 9 parameters with 0% description coverage, yet the description only mentions modes and vaguely references 'query' and 'url' contextually. It does not explain parameters like layers, actor_id, session_id, result_limit, etc., leaving significant gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: 'Sense reality into evidence (not reasoning, not judgment)', enumerates specific modes (search, fetch, etc.), and distinguishes itself from sibling arif_route. This provides a specific verb+resource scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('user needs factual evidence, web search, URL fetch, system vitals, or entropy measurement') and when not to use ('For domain-specific computation... use arif_route instead'), offering clear alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

arif_route444 Route · Intent→OrganA

Read-onlyIdempotent

Inspect

KERNEL 444 · Intent→organ router (default path to GEOX/WEALTH/WELL/A-FORGE). Select when goal is known but organ/verb is not. Optional organ_tool = governed bridge (prefer over arif_bridge_connect). Not session preflight (use arif_init mode=preflight|triage). Returns organ, port, tool_prefix, suggested_tools. Use when: the user's request involves domain-specific computation (geology, capital, health, execution) and you need to route to the correct federation organ.

ParametersJSON Schema

Name	Required	Description	Default
`mode`	No	Operation mode: route, bridge	route
`task`	No	Alias for intent (backward compat).
`organ`	No	Optional explicit organ override. If provided, intent matching is skipped and this organ is used directly.
`intent`	Yes	Natural-language description of what the user wants. e.g. "interpret this seismic section", "assess portfolio risk"
`actor_id`	No	Calling actor.
`_envelope`	No
`arguments`	No	Arguments to pass to organ_tool.
`organ_tool`	No	The tool name on the target organ to call. If absent, returns routing decision only (no bridge call).
`session_id`	No	Governing session.
`session_token`	No	SCT from arif_init (ChatGPT continuity).
`contract_c_kwargs`	No

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, destructiveHint=false, idempotentHint=true. Description adds context about return values (organ, port, tool_prefix, suggested_tools) and the optional bridge call behavior when organ_tool is provided. No contradictions, and value is added beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is four sentences, front-loaded with purpose and key usage conditions. Every sentence adds essential information; there is no redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 8 parameters (1 required) and no output schema, the description covers core purpose, usage, return values, and key optional behaviors. Some parameters (e.g., _envelope, actor_id) are not explained in the description, but schema covers them. Overall sufficient for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 88%, so baseline is 3. Description provides additional meaning beyond schema: explains intent as natural-language description, clarifies task as alias, and notes organ_tool's optionality and effect. This adds value for an agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it's an 'Intent→organ router' that routes to specific organs (GEOX/WEALTH/WELL/A-FORGE). It distinguishes from siblings by mentioning 'prefer over arif_bridge_connect' and 'Not session preflight (use arif_init)'. The purpose is precise and actionable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'Select when goal is known but organ/verb is not.' Also provides exclusions: 'Not session preflight' and alternatives: 'prefer over arif_bridge_connect'. This gives clear guidance for correct invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

arif_seal999 Seal · VAULT999B

Destructive

Inspect

KERNEL 999 · VAULT999 immutable append — civilizational memory, irreversible. Authority: 888_HOLD / SOVEREIGN + ack_irreversible for seal mode. Modes: seal | verify | chain | list | dry_run | seal_card | render. Seal only after SEAL verdict path; HOLD/SABAR/VOID do not seal. Testing → dry_run. Kernel judges; vault seals; Arif owns F13 veto.

ParametersJSON Schema

Name	Required	Default
`mode`	No	seal
`nonce`	No
`payload`	No
`actor_id`	No
`_envelope`	No
`session_id`	No
`drift_events`	No
`witness_type`	No	ai
`session_token`	No
`constitutional`	No
`actor_signature`	No
`judge_state_hash`	No
`constitutional_chain_id`	No

Tool Definition Quality

B3.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructiveHint=true. The description adds context about irreversible, civilizational memory, and the requirement for ack_irreversible in seal mode, which goes beyond annotations. However, it does not detail what gets destroyed or permissions needed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise but includes unnecessary jargon and a mismatch with the schema. It front-loads the purpose but the inaccuracies detract from its efficiency.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (13 parameters, no output schema, esoteric domain), the description is insufficient. It provides high-level intent but lacks detail on parameters, return values, and operational context, leaving the agent under-informed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description should compensate but fails. It lists modes that do not match the schema (e.g., 'chain', 'list', 'dry_run' vs. 'ledger', 'changelog', 'audit') and mentions ack_irreversible which is not a parameter. No explanation for any of the 13 parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for irreversible appending to VAULT999, with specific modes and a use case when a verdict is reached. However, the reference to sibling tools is indirect and the description includes jargon that may obscure full clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says when to use the tool (when a verdict is reached or verifying integrity) and what not to use it for (HOLD/SABAR/VOID paths). It does not explicitly name alternative sibling tools but provides enough context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

arif_think333 Think · MindA

Read-onlyIdempotent

Inspect

KERNEL 333 · Mind — structured reasoning under F2/F7 (not chat, not verdict). Modes: reason | reflect | verify | plan | plan_review | plan_approve | refactor_plan | metabolize | axioms | atlas. Mode 'atlas' returns ATLAS333 cognitive geometry: GPV, activated paradoxes, triggered quotes, zone map, TEARFRAME thresholds, and calibration guidance. Returns OBS/DER/INT/SPEC labels. Maruah/ethics → arif_critique. Binding decision → arif_judge. Use when: the user needs structured reasoning, plan generation, plan review, reflection on past actions, verification of claims, axiom exploration, or ATLAS333 paradox-aware cognitive geometry mapping.

ParametersJSON Schema

Name	Required	Default
`mode`	No	reason
`query`	No
`plan_id`	No
`actor_id`	No
`_envelope`	No
`session_id`	No
`witness_type`	No	ai
`session_token`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.3/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, ensuring safety. The description adds that it returns OBS/DER/INT/SPEC labels, clarifying output format. No contradictions; transparency is high.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is compact (4 sentences) with key information front-loaded: kernel identity, modes, output, and alternatives. However, it could be slightly improved by briefly adding parameter hints without becoming verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Purpose, usage, alternatives, and output labels are covered, but the parameter explanations are severely lacking. Given the tool's complexity (8 parameters, many modes), the description misses critical details that would help an agent use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must explain parameters. It only enumerates the mode enum values but does not describe what each mode does or explain the other 7 parameters (query, plan_id, actor_id, etc.). This is insufficient for effective use.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs structured reasoning under F2/F7, listing specific modes and output labels. It distinguishes from siblings by routing ethics to arif_critique and binding decisions to arif_judge, making the purpose highly specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists usage scenarios ('Use when: the user needs structured reasoning, plan generation, plan review...') and provides alternatives for ethics and binding decisions, giving clear when-to-use and when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

arifOS — Constitutional AI Kernel

Server Details

Tool Definition Quality

Available Tools

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Tool Definition Quality

Output Schema

Tool Definition Quality

Discussions

Your Connectors

Resources