Skip to main content
Glama
CoderDayton

verifiable-thinking-mcp

scratchpad

Structure reasoning with verification, detect cognitive traps, and self-challenge to improve problem-solving accuracy.

Instructions

Structured reasoning w/verification, trap detection, self-challenge. []=optional

OPS (required: operation=): step thought= [question=1st] [confidence=] [verify=] [domain=math|logic|code|general] [compress=true]→add step. Auto-verifies when chain >3 steps. complete [final_answer=] [summary=]→finalize+spot-check revise target_step= thought= [reason=]→fix step branch thought= [from_step=] [hypothesis=] [success_criteria=]→fork path navigate view=history|branches|step|path [step_id=] [limit=10]→inspect augment text= [store_as_step=false]→compute+inject math results hint [expression=] [reveal_count=] [cumulative=true] [reset=false]→progressive hints (auto-continues) mistakes text=→check algebraic errors spot_check question= answer=→check for common reasoning traps challenge [target_claim=] [challenge_type=all]→adversarial self-check override failed_step= [reason=]→force-commit failed step

DEFAULTS: session_id=auto confidence_threshold=0.8 token_budget=3000 augment_compute=true compress=true

STATUS→ACTION: continue→add steps | threshold_reached→complete or verify | review→use reconsideration.suggested_revise | verification_failed→revise|branch|override | budget_exhausted→complete or new session

FLOW: 1.step(question="...",thought="...")→primes trap detection for the question 2.step(thought="...")×N→auto-verify, auto-compress, confidence-drift detection, consistency checks 3.[optional]challenge()→adversarial self-check of claims 4.complete(final_answer="...")→auto spot-check against common traps 5.if status=review→revise per reconsideration.suggested_revise

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
operationYesOperation to perform
confidence_thresholdNoChain confidence threshold to suggest completion (default: 0.8)
token_budgetNoMax tokens before auto-compressing new steps (default: 3000)
warn_at_tokensNoWarn when cumulative session tokens exceed this threshold (soft limit, cost control)
hard_limit_tokensNoHard stop when cumulative session tokens exceed this threshold. Returns budget_exhausted status and blocks further operations.
thoughtNoCurrent reasoning/analysis (step/branch/revise)
purposeNoStep category
outcomeNoResult or conclusion from this step
confidenceNoConfidence in this step (0-1). Contributes to chain average.
contextNoPrior context or findings
verifyNoRun domain verification. Auto-enabled for chains >3 steps. Set to false to disable.
domainNo
local_computeNoTry local compute for math (default: false)
augment_computeNoAuto-inject computed values into thought (default: true)
compressNoCompress thought before storing (default: true)
compression_queryNoQuery for context-aware compression
max_step_tokensNoMax tokens for this step. Rejects if exceeded (default: no limit)
force_largeNoAllow step even if it exceeds max_step_tokens (default: false)
preconditionsNoAssumptions that MUST be true for this step (e.g., 'x > 0', 'file exists')
viewNoWhat to view: history (all steps), branches (list), step (specific), path (lineage)
step_idNoStep number to view
branch_idNoFilter history by branch
limitNoMax steps to return (default: 10)
from_stepNoStep to branch from (default: current)
branch_nameNoHuman-readable branch name
hypothesisNoFalsifiable hypothesis this branch will test (e.g., 'Assume X is prime')
success_criteriaNoWhat observation proves/disproves this hypothesis
target_stepNoStep number to revise
reasonNoWhy revising this step / Why overriding verification
summaryNoFinal summary/conclusion
final_answerNoThe answer/result
questionNoOriginal question. On step: enables trap priming and stores for auto spot-check. On complete: enables spot-check.
textNoText containing math expressions to compute and inject (augment/mistakes)
system_contextNoSystem prompt context for domain filtering
store_as_stepNoStore augmented result as a reasoning step (default: false)
acknowledgeNoConfirm you understand verification failed but want to proceed
failed_stepNoStep number that failed verification
expressionNoMath expression to simplify. Omit to continue from previous hint in session.
reveal_countNoNumber of steps to reveal. Omit to auto-increment when continuing.
cumulativeNoShow all steps up to reveal_count (true) or just the nth step (false). Default: true
resetNoReset hint state and start from beginning (default: false)
answerNoThe proposed answer to check for trap patterns
challenge_typeNoType of challenge to generate (default: all)
target_claimNoSpecific claim to challenge (optional - if omitted, extracts claims from steps)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are empty, so the description carries the full burden. It discloses several behavioral traits: auto-verification for chains >3 steps, auto-compression, confidence-drift detection, consistency checks, and status-driven actions (e.g., 'verification_failed→revise|branch|override'). It also mentions defaults like 'confidence_threshold=0.8' and 'token_budget=3000'. However, it lacks details on error handling, performance limits, or side effects, leaving some gaps in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is overly verbose and poorly structured, with dense sections like 'OPS', 'DEFAULTS', 'STATUS→ACTION', and 'FLOW' that mix operational details, defaults, and usage flow without clear separation. Sentences are fragmented (e.g., 'Auto-verifies when chain >3 steps.'), and it includes unnecessary symbols like '[]=optional'. It is not front-loaded with a clear purpose, making it difficult to parse efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high complexity (44 parameters, no output schema, no annotations), the description attempts to cover behavior and flow but falls short. It explains operations and status transitions but lacks details on return values, error responses, or integration with sibling tools. Without an output schema, the description should ideally explain what the tool returns, but it does not, leaving gaps in completeness for such a complex tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 98%, so the schema already documents most parameters extensively. The description adds minimal semantic value beyond the schema: it lists operation types (e.g., 'step', 'complete') and hints at parameter usage in the flow (e.g., 'step(question="...",thought="...")'), but does not explain parameter interactions or provide examples. With high schema coverage, the baseline is 3, and the description does not significantly compensate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description begins with 'Structured reasoning w/verification, trap detection, self-challenge', which provides a vague high-level purpose but lacks a specific verb-resource combination. It then dives into operational details without clearly stating what the tool fundamentally does (e.g., manage a reasoning session, perform stepwise analysis). The title is null, and the name 'scratchpad' is generic, making the purpose unclear without reading the entire description.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'FLOW' section with numbered steps (e.g., '1.step(question="...",thought="...")→primes trap detection'), which implies usage in a sequential reasoning process. However, it does not explicitly state when to use this tool versus alternatives like 'clear_session' or 'compress', nor does it provide context on prerequisites or exclusions. The guidance is implied through the flow but not clearly articulated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/CoderDayton/verifiable-thinking-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server