# Thoughtbox AgX Assessment ("Great CLI" / "Great Tool" Heuristics)
Date: 2025-12-15
Scope note: This report is based on the **last committed state** of the repo (the `container-use` environment did not include your local uncommitted changes). It is an AgX review of Thoughtbox as an MCP server plus its CLI/transport wrappers.
Thoughtbox MCP session used during assessment: `443a1dc0-98d8-44d6-aade-a0f9e48648a8`
## Executive summary
Thoughtbox already exhibits several “great CLI / great hammer” traits:
- **Low cognitive overhead interfaces**
The core surface area is small and coherent:
- `thoughtbox` (single purpose: capture reasoning as structured nodes)
- `notebook` toolhost (operations catalog; extensible)
- `mental_models` toolhost (discovery + retrieval)
- `export_reasoning_chain` (durability)
- **High discoverability inside the protocol**
The combination of:
- `prompt://list_mcp_assets`
- embedded resources (`thoughtbox://patterns-cookbook`, `thoughtbox://notebook/operations`)
- the `thoughtbox://init/...` “CLI-like” navigation flow
creates a strong in-band help system.
- **Fast feedback loops and observability**
Observatory provides a live UI and WS/REST endpoints, while the `thoughtbox` tool prints formatted thoughts to `stderr` for “immediate feedback” in stdio contexts.
However, there are a few “tool gets in the way” frictions that are likely to matter in daily use:
- **Broken or incomplete navigation links**
The init flow renders “View full session: `thoughtbox://sessions/{id}`” but no such resource is registered.
- **Inconsistent versioning & identity surfaces**
Package version (`package.json`), `/health` versions, and `McpServer({ version })` differ. This is a subtle but persistent trust/diagnostics drag.
- **Mixed expectations about persistence**
The runtime storage is in-memory, while durability happens via auto-export to `~/.thoughtbox/exports`. That’s workable, but the product story should make this **explicit and predictable**.
The rest of this report maps findings to “great tool” heuristics and recommends prioritized fixes.
---
## What “great tool” means in AgX terms
A good CLI / hammer becomes an extension of self when it:
- **Minimizes mode switches**
The user doesn’t have to keep remembering how to “talk to the tool.”
- **Is predictable**
Same inputs yield the same shape of outputs; conventions are stable.
- **Has strong affordances**
The next action is obvious; errors tell you what to do.
- **Has sharp edges only where it should**
It is safe by default; “power” is opt-in.
- **Composes with the user’s environment**
It works with piping, automation, and different transports.
In MCP-land, those translate into:
- **Discoverable tools/resources/prompts**
- **Stable schemas and consistent response envelopes**
- **Clear separation of `stdout` vs `stderr`**
- **Durable state with explicit lifecycle**
- **Convenient entry points (init flow, quickstarts)**
---
## Inventory of current user-facing surfaces
### Tools
- `thoughtbox`
- Captures a thought with structure: numbering, branching, revision, etc.
- Auto-creates a session on first use.
- Optionally embeds the patterns cookbook.
- `notebook`
- Toolhost pattern: `operation` + `args`.
- Embeds operation documentation as a resource in responses.
- `mental_models`
- Toolhost pattern: `operation` + `args`.
- Supports discovery operations + `get_capability_graph`.
- `export_reasoning_chain`
- Exports linked JSON to filesystem; default: `~/.thoughtbox/exports`.
### Prompts
- `list_mcp_assets`
- Builds a full “capabilities doc” dynamically.
- `interleaved-thinking`
- A structured workflow prompt for interleaving reasoning and tool actions.
### Resources
- `thoughtbox://init` plus templated subpaths
- CLI-like navigation across projects/tasks/aspects based on exported sessions.
- `thoughtbox://patterns-cookbook`
- `thoughtbox://architecture`
- `thoughtbox://notebook/operations`
- `thoughtbox://mental-models/...`
### Transports / entry points
- STDIO MCP server (published via `bin: { thoughtbox: dist/index.js }`)
- HTTP stateless (`src/http.ts`) exposing `/mcp` (POST)
- HTTP stateful (`src/http-stateful.ts`) using Smithery SDK with session isolation
- Observatory web UI (optional; env-gated)
---
## Findings by “great tool” heuristic
## 1) Discoverability (“I know what to do next”)
### What’s strong
- `thoughtbox://init` is explicitly described as “START HERE FIRST” in the dynamic resources list.
- `prompt://list_mcp_assets` is an in-band, schema-synchronized help surface.
- The notebook tool embeds per-operation docs as a resource, which is a particularly good “discoverability per action” pattern.
### Frictions
- The init UI links to `thoughtbox://sessions/{id}` but that resource does not exist.
- This breaks the “affordance loop” at the exact moment the user wants to deep-dive.
- The README references a patterns cookbook path (`src/resources/docs/...`) that does not exist in this committed tree (the cookbook is embedded via `src/resources/patterns-cookbook-content.ts`).
- Even if the user never reads the file directly, any mismatch between docs and implementation creates long-term onboarding papercuts.
### Recommendation (high priority)
- Implement `thoughtbox://sessions/{id}` resource(s), or remove/replace the link.
- Minimal viable: read the most recent export matching that session id and render a markdown view.
- Better: provide a small session browsing API:
- `thoughtbox://sessions` (list)
- `thoughtbox://sessions/{id}` (details)
---
## 2) Consistency (“No surprises”)
### What’s strong
- Toolhost pattern is used consistently (`notebook`, `mental_models`).
- Tool responses generally follow a consistent “JSON in a text part” convention.
### Frictions
- Version strings are inconsistent:
- `package.json` is `1.2.2`
- HTTP `/health` returns `1.2.0`
- MCP server created with `version: "1.0.0"`
- `src/CAPABILITIES.md` says `1.0.1`
Even if harmless functionally, this impacts:
- trust (“what am I running?”)
- bug reports
- reproduction
- compatibility expectations
### Recommendation (high priority)
- Centralize a single authoritative version source (e.g. imported from `package.json` during build) and use it for:
- `McpServer({ version })`
- HTTP `/health` and `/info`
- generated capabilities docs
---
## 3) Feedback loops (“I can feel the tool working”)
### What’s strong
- The `thoughtbox` tool prints a formatted rendering to `stderr` by default.
- This mirrors good CLI UX: humans get immediate, readable feedback.
- Observatory provides a live UI with:
- WS streams
- REST endpoints
- a baked-in HTML UI
### Frictions
- The formatted `stderr` output uses rich formatting and emojis.
- In many MCP clients, `stderr` is either invisible, noisy, or piped into logs.
- This can become “tool noise” rather than “tool feel.”
### Recommendations
- **Short-term**: gate rich formatting behind a heuristic:
- only print the formatted box when `process.stderr.isTTY` (or equivalent)
- **Medium-term**: add a config knob for output style:
- `THOUGHTBOX_LOG_FORMAT=pretty|plain|json|off`
---
## 4) Safety and failure modes (“It doesn’t break my flow”)
### What’s strong
- Observatory emission is explicitly “never affect reasoning”; failures are swallowed.
- Auto-export failure keeps the session open to avoid data loss.
- Index building protects against OOM via max file size.
### Frictions
- The system is effectively “durable-by-export” rather than “durable-by-storage.”
- That can be totally fine, but it should be communicated as a first-class invariant.
### Recommendations
- Make the lifecycle explicit in docs and maybe in-tool responses:
- “Sessions are durable once exported (auto-export on completion).”
- “In-memory state resets on server restart; exports persist.”
---
## 5) Composability and automation (“It fits into my workflow”)
### What’s strong
- MCP primitives (tools/resources/prompts) are composable with agent workflows.
- `export_reasoning_chain` returns a deterministic path and metadata.
### Frictions
- The “CLI” entry point is really a stdio MCP server. That’s correct for MCP, but a user seeing a `thoughtbox` executable may expect:
- `thoughtbox --help`
- `thoughtbox --version`
- `thoughtbox serve --http`
If the binary is primarily for MCP clients, consider clarifying/reshaping the CLI contract.
### Recommendations
- Decide which of these you want:
- **Option A (MCP-first, minimal CLI):** document that `thoughtbox` is an MCP stdio server and provide only `--help`/`--version`.
- **Option B (dual-mode CLI):** implement subcommands:
- `thoughtbox stdio`
- `thoughtbox http`
- `thoughtbox observatory`
---
## 6) “Extension of self” ergonomics (“It gets out of the way”)
### What’s strong
- The init flow is a “CLI menu” encoded as resources. That’s a great pattern:
- it is low-friction
- it’s link-driven
- it’s state-machine based
- it provides “next action” affordances
### Frictions
- The init flow depends on exported sessions and structured tags, but the system doesn’t strongly guide the user into those conventions except via text.
- This is okay for expert users, but novices will drift.
### Recommendations
- Consider adding a “tag linter” / “tag suggester” behavior:
- if `sessionTags` are missing `project:`/`task:`/`aspect:`, suggest them in the response (without being blocking).
---
## Priority recommendations
## High priority (highest leverage)
1) Fix the broken link to `thoughtbox://sessions/{id}`
- Implement the resource or remove the link.
- This directly repairs the init flow’s “affordance loop.”
2) Unify version strings across all surfaces
- MCP server version, HTTP `/health`, docs generation.
3) Make `stderr` formatting context-aware
- Avoid noisy pretty output in non-interactive contexts.
## Medium priority
4) Clarify the persistence contract in docs and tool responses
- Make “durable-by-export” explicit.
5) Consider adding session browsing tools/resources
- `thoughtbox://sessions` list
- `thoughtbox://sessions/{id}` details
- Optional: a `list_sessions` tool for scripts.
## Longer-term
6) Decide on an explicit CLI product surface
- Minimal (MCP server only) vs dual-mode (serve http/stateful/etc).
7) Tighten invariants between init flow and exports
- e.g. explicitly show “index built at: …” and “exports directory: …”
---
## Concrete “AgX patterns” to keep and extend
- **Toolhost + embedded operation docs** (already excellent)
- Extend this pattern to other multi-op surfaces (session browsing, exports management).
- **Init flow as resource-based navigation**
- Keep it “link-first” and maintain referential integrity (no broken URIs).
- **Observability as a parallel channel**
- Keep the “never interfere with reasoning” guarantee.
- **Just-in-time guidance, not a big manual**
- Continue the embedded cookbook approach.
---
## Appendix: evidence (code points)
- MCP tool registration: `src/index.ts` (`registerTool` for thoughtbox/notebook/mental_models/export_reasoning_chain)
- Init flow renderer that links to sessions: `src/init/renderers/markdown.ts` (references `thoughtbox://sessions/{id}`)
- No implementation of `thoughtbox://sessions` resources found in repo
- Patterns cookbook is embedded: `src/resources/patterns-cookbook-content.ts`
- HTTP health endpoints: `src/http.ts`, `src/http-stateful.ts`
- In-memory storage + export: `src/persistence/storage.ts`, `src/persistence/export.ts`
- Observatory: `src/observatory/server.ts`, `src/observatory/config.ts`