verifiable-thinking-mcp

scratchpad

Structure reasoning with verification, detect cognitive traps, and self-challenge to improve problem-solving accuracy.

Instructions

Structured reasoning w/verification, trap detection, self-challenge. []=optional

OPS (required: operation=): step thought= [question=1st] [confidence=] [verify=] [domain=math|logic|code|general] [compress=true]→add step. Auto-verifies when chain >3 steps. complete [final_answer=] [summary=]→finalize+spot-check revise target_step= thought= [reason=]→fix step branch thought= [from_step=] [hypothesis=] [success_criteria=]→fork path navigate view=history|branches|step|path [step_id=] [limit=10]→inspect augment text= [store_as_step=false]→compute+inject math results hint [expression=] [reveal_count=] [cumulative=true] [reset=false]→progressive hints (auto-continues) mistakes text=→check algebraic errors spot_check question= answer=→check for common reasoning traps challenge [target_claim=] [challenge_type=all]→adversarial self-check override failed_step= [reason=]→force-commit failed step

DEFAULTS: session_id=auto confidence_threshold=0.8 token_budget=3000 augment_compute=true compress=true

FLOW: 1.step(question="...",thought="...")→primes trap detection for the question 2.step(thought="...")×N→auto-verify, auto-compress, confidence-drift detection, consistency checks 3.[optional]challenge()→adversarial self-check of claims 4.complete(final_answer="...")→auto spot-check against common traps 5.if status=review→revise per reconsideration.suggested_revise

Input Schema

TableJSON Schema

Name	Required	Description
`operation`	Yes	Operation to perform
`confidence_threshold`	No	Chain confidence threshold to suggest completion (default: 0.8)
`token_budget`	No	Max tokens before auto-compressing new steps (default: 3000)
`warn_at_tokens`	No	Warn when cumulative session tokens exceed this threshold (soft limit, cost control)
`hard_limit_tokens`	No	Hard stop when cumulative session tokens exceed this threshold. Returns budget_exhausted status and blocks further operations.
`thought`	No	Current reasoning/analysis (step/branch/revise)
`purpose`	No	Step category
`outcome`	No	Result or conclusion from this step
`confidence`	No	Confidence in this step (0-1). Contributes to chain average.
`context`	No	Prior context or findings
`verify`	No	Run domain verification. Auto-enabled for chains >3 steps. Set to false to disable.
`domain`	No
`local_compute`	No	Try local compute for math (default: false)
`augment_compute`	No	Auto-inject computed values into thought (default: true)
`compress`	No	Compress thought before storing (default: true)
`compression_query`	No	Query for context-aware compression
`max_step_tokens`	No	Max tokens for this step. Rejects if exceeded (default: no limit)
`force_large`	No	Allow step even if it exceeds max_step_tokens (default: false)
`preconditions`	No	Assumptions that MUST be true for this step (e.g., 'x > 0', 'file exists')
`view`	No	What to view: history (all steps), branches (list), step (specific), path (lineage)
`step_id`	No	Step number to view
`branch_id`	No	Filter history by branch
`limit`	No	Max steps to return (default: 10)
`from_step`	No	Step to branch from (default: current)
`branch_name`	No	Human-readable branch name
`hypothesis`	No	Falsifiable hypothesis this branch will test (e.g., 'Assume X is prime')
`success_criteria`	No	What observation proves/disproves this hypothesis
`target_step`	No	Step number to revise
`reason`	No	Why revising this step / Why overriding verification
`summary`	No	Final summary/conclusion
`final_answer`	No	The answer/result
`question`	No	Original question. On step: enables trap priming and stores for auto spot-check. On complete: enables spot-check.
`text`	No	Text containing math expressions to compute and inject (augment/mistakes)
`system_context`	No	System prompt context for domain filtering
`store_as_step`	No	Store augmented result as a reasoning step (default: false)
`acknowledge`	No	Confirm you understand verification failed but want to proceed
`failed_step`	No	Step number that failed verification
`expression`	No	Math expression to simplify. Omit to continue from previous hint in session.
`reveal_count`	No	Number of steps to reveal. Omit to auto-increment when continuing.
`cumulative`	No	Show all steps up to reveal_count (true) or just the nth step (false). Default: true
`reset`	No	Reset hint state and start from beginning (default: false)
`answer`	No	The proposed answer to check for trap patterns
`challenge_type`	No	Type of challenge to generate (default: all)
`target_claim`	No	Specific claim to challenge (optional - if omitted, extracts claims from steps)

Tool Definition Quality

C2.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are empty, so the description carries the full burden. It discloses several behavioral traits: auto-verification for chains >3 steps, auto-compression, confidence-drift detection, consistency checks, and status-driven actions (e.g., 'verification_failed→revise|branch|override'). It also mentions defaults like 'confidence_threshold=0.8' and 'token_budget=3000'. However, it lacks details on error handling, performance limits, or side effects, leaving some gaps in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is overly verbose and poorly structured, with dense sections like 'OPS', 'DEFAULTS', 'STATUS→ACTION', and 'FLOW' that mix operational details, defaults, and usage flow without clear separation. Sentences are fragmented (e.g., 'Auto-verifies when chain >3 steps.'), and it includes unnecessary symbols like '[]=optional'. It is not front-loaded with a clear purpose, making it difficult to parse efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high complexity (44 parameters, no output schema, no annotations), the description attempts to cover behavior and flow but falls short. It explains operations and status transitions but lacks details on return values, error responses, or integration with sibling tools. Without an output schema, the description should ideally explain what the tool returns, but it does not, leaving gaps in completeness for such a complex tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 98%, so the schema already documents most parameters extensively. The description adds minimal semantic value beyond the schema: it lists operation types (e.g., 'step', 'complete') and hints at parameter usage in the flow (e.g., 'step(question="...",thought="...")'), but does not explain parameter interactions or provide examples. With high schema coverage, the baseline is 3, and the description does not significantly compensate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description begins with 'Structured reasoning w/verification, trap detection, self-challenge', which provides a vague high-level purpose but lacks a specific verb-resource combination. It then dives into operational details without clearly stating what the tool fundamentally does (e.g., manage a reasoning session, perform stepwise analysis). The title is null, and the name 'scratchpad' is generic, making the purpose unclear without reading the entire description.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'FLOW' section with numbered steps (e.g., '1.step(question="...",thought="...")→primes trap detection'), which implies usage in a sequential reasoning process. However, it does not explicitly state when to use this tool versus alternatives like 'clear_session' or 'compress', nor does it provide context on prerequisites or exclusions. The guidance is implied through the flow but not clearly articulated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/CoderDayton/verifiable-thinking-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server