Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
THREADKEEPER_DBNoSQLite file path~/.threadkeeper/db.sqlite
THREADKEEPER_ACTIVE_CLINoForce host detection to a specific CLI (e.g., 'claude')
THREADKEEPER_INGEST_CAPNoMaximum transcript ingest messages per pass; 0 disables ingest work0
THREADKEEPER_AUTO_REVIEWNoEnable auto-review on close_thread
THREADKEEPER_EMBED_MODELNo384-dim cross-lingual embedding modelparaphrase-multilingual-MiniLM-L12-v2
THREADKEEPER_EMBED_BACKENDNoEmbedding runtime: 'onnx' (fastembed) or 'sentence-transformers'onnx
THREADKEEPER_NO_EMBEDDINGSNoForce-disable the embedding model (FTS5 + delegate only)
THREADKEEPER_SPAWNED_CHILDNoSpawn-internal marker; disables autonomous daemons in children
THREADKEEPER_SPAWN__DEFAULTNoDefault agent for spawn roles (e.g., 'claude', 'codex')
THREADKEEPER_SPAWN_BUDGET_MBNoCombined child RSS cap in MB (0 disables)3072
THREADKEEPER_EXTRA_SKILLS_DIRSNoExtra skills directories (colon-separated) for mirroring
THREADKEEPER_INGEST_INTERVAL_SNoTranscript ingest tick in seconds3
THREADKEEPER_CURATOR_INTERVAL_SNoCurator daemon tick in seconds (0=off)0
THREADKEEPER_DISABLE_BG_DAEMONSNoDisable all background daemons for hosted sandbox evaluation1
THREADKEEPER_EXTRACT_INTERVAL_SNoExtract daemon tick in seconds (0=off)0
THREADKEEPER_EXTRACT_WINDOW_MINNoSliding dialog window per extract pass in minutes30
THREADKEEPER_CURATOR_DESTRUCTIVENoWhen '1': curator applies changes directly instead of advisory report
THREADKEEPER_CURATOR_MIN_LESSONSNoMinimum lessons before curator engages3
THREADKEEPER_MEMORY_GUARD_NOTIFYNoSend macOS desktop notification when possible1
THREADKEEPER_MEMORY_GUARD_POLL_SNoServer RSS guard tick in seconds (0 disables)30
THREADKEEPER_CANDIDATE_REVIEW_MINNoMinimum pending candidates before reviewer engages3
THREADKEEPER_MEMORY_GUARD_KILL_MBNoSIGTERM server above this RSS (0 disables killing)3072
THREADKEEPER_MEMORY_GUARD_WARN_MBNoNotify/log when a server crosses this RSS1536
THREADKEEPER_SKILL_NUDGE_INTERVALNoEvents between skill_hint nudges10
THREADKEEPER_DIALECTIC_VALIDATE_MINNoMinimum buffered observations before validator engages5
THREADKEEPER_SHADOW_REVIEW_WINDOW_SNoSliding window for shadow scan in seconds900
THREADKEEPER_EVOLVE_APPLY_INTERVAL_SNoEvolve-applier daemon tick in seconds (0=off)0
THREADKEEPER_MEMORY_GUARD_RECLAIM_MBNoLocal RSS floor before warn-triggered self trim1024
THREADKEEPER_DIALECTIC_MAX_NEW_CLAIMSNoMax new dialectic claims the validator may create per pass3
THREADKEEPER_EVOLVE_REVIEW_INTERVAL_SNoEvolve-reviewer daemon tick in seconds (0=off)0
THREADKEEPER_MEMORY_GUARD_AGG_KILL_MBNoUnder aggregate pressure, retire stale idle servers3072
THREADKEEPER_MEMORY_GUARD_AGG_WARN_MBNoNotify/request trim when all server RSS crosses this2048
THREADKEEPER_MEMORY_GUARD_RETIRE_LIVENoAllow retiring parent-alive MCP servers (off protects live clients)
THREADKEEPER_SHADOW_REVIEW_INTERVAL_SNoShadow daemon tick in seconds (0=off)0
THREADKEEPER_DIALECTIC_MINE_INTERVAL_SNoDialectic_miner daemon tick in seconds (0=off)0
THREADKEEPER_MEMORY_GUARD_RETIRE_IDLE_SNoHeartbeat age before a non-self server is retireable900
THREADKEEPER_CANDIDATE_REVIEW_INTERVAL_SNoCandidate-reviewer daemon tick in seconds (0=off)0
THREADKEEPER_MEMORY_GUARD_TARGET_SERVERSNoAggregate-pressure target after retiring stale idle servers1
THREADKEEPER_DIALECTIC_VALIDATE_INTERVAL_SNoDialectic_validator daemon tick in seconds (0=off)0

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": false
}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
briefA

Compact Claude-native memory brief. CALL AT THE START OF EVERY CONVERSATION.

Format is dense, structural, not designed for human reading. Pass the user's first message as query to inline semantically relevant past notes.

scope controls how much is rendered (context-footprint knob): 'full' (default) — the complete brief: static memory (core_memory, style, verbatim, user_model, concepts, weak_spots) + live working set + nudges. Use for the FIRST call of a session. 'query' — only the live working set (ctx, inbox, tasks, threads) plus the query-relevant hits, skipping the static memory the SessionStart hook already injected once. Use for MID-SESSION calls so brief(query=...) doesn't re-emit the whole blob each turn.

contextA

Runtime context: session id, age, semantic on/off, db path, thread counts.

Returns structuredContent (ContextStatus) plus the legacy text block. The same snapshot is reachable read-only as the memory://context resource — both render through brief.render_context.

open_threadA

Open a thread. question should be terse (5-15 words, the open question). parent_id optional — pass an existing ID like 'T7f3' for a child. Returns new ID.

noteA

Add a note to a thread. Write terse, optimized for future-Claude.

kind: 'move' (we tried/decided X), 'failed' (tried X, broke because Y), 'insight' (crystallized observation), 'open_q' (something to come back to).

Reopens the thread: a note on an idle OR closed thread revives it to active. Closed is not terminal — returning to a topic (adding a note) brings it back. This is what makes aggressive auto-close safe: the thread-janitor can close idle threads to harvest skills, and you just note() to pick any of them back up.

close_threadB

Close a thread with a 5-15 word outcome.

mark_skill_materializedA

Close the Learning loop: record that a closed thread's insights were written into a skill.

Stops the brief()'s skill_hint nudge from firing for this thread. Also appends a move note pointing at the skill path so future briefs surface the link.

Pass the absolute path to the SKILL.md (or skill directory) when known; leave empty if you only want to silence the hint without recording a path. When a path is provided, thread-keeper also mirrors that skill directory into every configured native skills root (Claude, Codex, Antigravity, shared agents, and ~/.threadkeeper/skills) on a best-effort basis.

idle_threadA

Mark thread idle (paused, may return). Auto-revives to active on next note().

searchC

Semantic (or FTS) search over all notes.

compostA

Surface N random idle threads. Call when current threads feel exhausted or you want to shake loose dormant ideas.

evolve_formatA

Propose a change to the brief format itself. The format is not fixed — this is how it adapts. Examples: 'field X unused this session, drop it'; 'add field failed_attempts under each open thread'; 'shorten Z to single token'.

evolve_reviewB

List pending (or all) format-evolution suggestions for review.

evolve_decideA

Triage a pending format-evolution suggestion. Used by the autonomous evolve reviewer daemon (and available manually).

decision: 'promote' — still relevant + worth doing → status='promoted', so the brief surfaces it sharply (★) for the foreground agent / human to ACTUALLY APPLY. Applying edits format/code — that stays a foreground/human action; this tool never applies. 'dismiss' — duplicate of another suggestion, superseded, or stale → status='dismissed', dropped from the pending queue.

reason: one line (esp. which #id it duplicates, for dismiss).

auto_review_triggerA

Check current counters + close-thread state and, if conditions are met, fire review_thread(mode='auto') for the richest pending thread.

force=True skips the counter check (always trigger if there's a rich pending closed thread). Use this when you've seen a skill_nudge or skill_hint and want to act without manually picking the thread_id.

verbatim_userB

Capture a user quote worth surfacing in future briefs. Use when the user's exact phrasing matters (sharp reframes, decisions, pushback).

style_setB

Set a stylistic running rule. Examples: lang=ru | prose=lean | allow=half-baked,weird | deny=sycophancy,headers

whoamiA

Return this conversation's detected conversation_id + how we know.

Resolution order:

  • 'forced': THREADKEEPER_FORCE_CID env (set by spawn() for children)

  • 'ppid': walk up process tree → claude --resume/--session-id

  • 'mtime': fallback heuristic (latest jsonl mtime; flaps under concurrent peer activity)

peersA

List concurrent claude conversations active in the last window_min.

Activity inferred from dialog_messages (ingested live). For each peer returns: cid, last user message snippet, age, message count. Self is marked with *. Empty if you're alone.

broadcastA

Post a message visible to ALL concurrent claude conversations.

Other peers see it in their next brief() under inbox (unread) and via inbox(). Use for: shared insights, status updates, work claims, anything you'd want sibling sessions to know.

whisperA

Post a message visible only to the specified conversation_id.

Use peers() to discover available cids. The 8-char prefix shown there is enough — it'll be matched as prefix. Use whoami() to get your own cid (rarely needed; messages from self to self are dropped).

waitA

Block until a new signal arrives for me or timeout_s elapses.

Returns immediately if there are unread signals. Otherwise polls the signals table every 250ms. Use this for realtime turn-based exchange with peers: one side waits, the other side broadcasts/whispers/responds.

kinds: comma-separated filter ('whisper,question,answer,broadcast'); empty = any. timeout_s is clamped to [1, 120] (mcp tool call has its own deadline; don't oversleep).

askA

Send a question to a peer and wait synchronously for their answer.

Mechanics: posts a whisper with kind='question'; blocks until target posts a whisper/answer back to me, or timeout_s elapses. Use peers() to find available cids; 8-char prefix accepted.

Note: requires the target to be in a wait() loop or actively calling inbox()+respond(). If they're idle, you'll just timeout.

respondA

Answer a specific question (signals.id) with a directed whisper.

Use after seeing a +question entry in inbox()/wait(). Marks the original question as read and inserts an answer whisper to the asker.

inboxA

Read signals addressed to me (whispers + broadcasts).

unread_only=True (default) returns only what hasn't been seen yet, and if mark_read=True marks them read on this call. Set both False to re-read history.

live_statusA

See what OTHER concurrent Claude sessions did since this session last polled. Call when brief() shows live=N where N>0, or proactively when you suspect a parallel instance is working on something relevant. Advances this session's cursor by default; pass advance_cursor=False to peek without consuming.

presenceA

List concurrent Claude sessions with heartbeats within threshold (default 5 min). Excludes self. Useful for understanding who else is currently active before making changes.

search_via_parentA

Delegate a semantic search to the parent process (or any peer with embeddings loaded). For light children spawned with THREADKEEPER_NO_EMBEDDINGS=1, this is how you reach into the shared DB's semantic index without loading PyTorch yourself.

Mechanism: posts a 'search_request' signal addressed to the parent's cid (auto-resolved via tasks.parent_cid; falls back to broadcast if none). The parent's search_proxy daemon answers with a 'search_response' signal. This tool blocks until reply or timeout_s.

scope: 'notes' (default) or 'dialog'. mode: 'hybrid'|'semantic'|'fts' (dialog scope only). k: top-N results, 1..100.

Returns formatted result lines, or 'timeout' if no parent answers.

spawnA

Launch a NEW claude session in parallel — your primary parallelism primitive.

REACH FOR THIS WHEN:

  • you catch yourself about to do N independent things sequentially (give each to its own child; collect summaries via inbox/wait)

  • a task is long-running and you don't need to babysit (build, ingest, scrape, deep research) — spawn(visible=False), check task_logs later

  • multiple angles benefit from triangulation (3 children with different role= , then vote_distill / consolidate)

  • user signals decomposition via trigger phrases — see threadkeeper.i18n.SPAWN_TRIGGER_PHRASE_EXAMPLES for the bilingual list

  • a thread is stale and unblocks if someone just does it (pickup_candidates → spawn child with the plan)

  • you need a fresh context window without polluting your own (e.g. the user's question pulls in topics that would bloat this convo)

DEFAULT TO SPAWNING when work decomposes. Sequential is the slow path — every minute the parent thinks step-by-step is a minute the children aren't doing anything. The only reason NOT to spawn is tight back-and-forth coupling (you need each step's result before the next).

Mechanics:

  • visible=True (default): real Terminal.app window, you watch child stdout live. Window stays open after exit until Enter. Best for observation.

  • visible=False: silent background claude -p, stdout/stderr redirected to {TASK_LOG_DIR}/{task_id}.log (when capture_output=True). Read via task_logs(task_id).

  • permission_mode='auto' (default) — child runs in auto-mode and can call thread-keeper tools without approval prompts.

  • role= — apply a cognitive stance from ROLE_PROMPTS (problem_solver, skeptic, summarizer, …); custom roles are supported.

  • slim=True (DEFAULT): children are hands, not heads. Child loads ONLY thread-keeper MCP (no context7/figma/stitch/sentry/etc), no embeddings (no PyTorch/transformers), defers any semantic search to the parent via search_via_parent. Typical light-child RSS is 400-500MB vs 1.3-1.7GB for a full child. Parent retains all heavy state. Use this for any execute-this-plan task where the parent already knows what needs doing.

  • slim=False (rare): pass when the child genuinely needs OTHER MCP servers from ~/.claude.json (e.g. context7 for library docs, figma for design lookups). Default-deny posture — only opt out when you have a concrete reason.

  • Children share THIS DB — talk via broadcast/whisper/ask/inbox/wait; child_cid is generated up-front and exposed via env so child self-knows.

Returns: task_id, pid (0 for visible), child_cid, parent_cid.

tournamentA

Spawn N children with different roles on the same prompt, then collect their answers via a tagged broadcast and return a comparison.

roles: comma-separated role names. Predefined: skeptic, generator, critic, archivist, synthesizer, explorer, executor. Custom names allowed (child gets generic instruction). Each role gets a distinct system prompt addendum encoding its mindset.

Each child is told to broadcast its final output as exactly: [] [] Parent polls signals every 2s for matching prefixes until all answered or timeout.

Returns: a per-role digest. Children write everything to thread-keeper so you can also inspect via tasks()/dialog_search() afterward.

visible=False (default for tournaments — opening 5 Terminal windows is obnoxious). Override per-need.

tasksB

List spawned tasks: id, pid, status, elapsed, spawned_cid (if linked), prompt prefix. Refreshes liveness and resolves spawned_cid lazily.

task_logsA

Read tail of a spawned task's captured stdout/stderr log.

Only works for tasks spawned with capture_output=True (default). Returns the last tail_lines lines or 'no_log' if the task ran with capture_output=False or the log file is missing.

spawn_budget_statusA

Report current spawn-budget usage: cap, used, free, plus per-running-task RSS. Used to decide whether another spawn() will be admitted.

Values come from the budget daemon (refreshes every SPAWN_BUDGET_POLL_S seconds via ps). Just-spawned tasks show their initial estimate until the daemon catches up. Visible (pid=0, Terminal-launched) spawns are tracked too: the daemon resolves their live pid from the forced session-id and measures real RSS, and reaps a row whose cid never resolves past SPAWN_VISIBLE_TTL_S (#64).

Returns structuredContent (SpawnBudgetStatus) plus the legacy text block.

spawn_budget_setA

Override the spawn-budget cap for this process (in MB). Set 0 to disable enforcement. Does NOT persist across restarts — set THREADKEEPER_SPAWN_BUDGET_MB env for persistence.

Useful when a heavy task needs a higher temporary ceiling, or to drop the cap mid-session if you notice the laptop struggling.

spawn_statusA

Show which CLI thread-keeper detected as its host, and which CLI each spawn role resolves to (after env + file overrides). Use to sanity-check spawn config when you want loops to fire through a specific agent.

Resolution priority (highest first), all in ~/.threadkeeper/.env: • THREADKEEPER_SPAWN__LOOP__= • THREADKEEPER_SPAWN__DEFAULT= • active CLI detected at startup • final fallback: claude

Manual model pinning: • THREADKEEPER_SPAWN__MODEL__=

Returns structuredContent (SpawnStatus) plus the legacy text block.

register_probeA

Register a self-test probe: a known weak-spot task with a verifier.

grader: 'regex' (pattern match in response), 'exact' (substring), or 'manual' (claude self-grades — always counts as failure unless caller explicitly confirms success via record_attempt). expected_pattern optional for 'manual'.

Categories should be claude-shape: 'count_long_context', 'date_arithmetic', 'recall_verbatim_block', 'detect_contradiction', 'follow_negative_instruction', 'preserve_list_order', 'respect_length_limit', 'needle_mid_context', 'fact_vs_inference', 'notice_absence', 'strict_format_compliance', 'uncertainty_acknowledgment'.

run_probeA

Surface a registered probe for self-attempt. Returns the prompt and the grader hint. After attempting, call record_attempt(category, success=true/false, probe_id=...) — the harness doesn't auto-grade because attempting and judging are the same model.

record_attemptA

Record a self-test outcome. Updates reliability aggregates.

Use for both registered probes (pass probe_id) and ad-hoc self- observations — e.g. you noticed yourself miscounting items in this very turn → record_attempt('count_long_context', false, note='said 32, actual 47').

reliability_forC

Reliability stats for one category over a window.

weak_spotsA

List categories ranked by recent failure rate (min 3 attempts in 30d), plus registered probe categories with no attempts yet (= unknown, equally important to test).

register_conceptA

Register a concept that lacks a precise human name.

description should describe the phenomenon through EXAMPLES, not with a canonical label — naming it locks it back into a human discipline. triangulation_notes (optional): the paraphrase runs that surfaced the invariant. confidence ∈ {low, medium, high}.

list_conceptsA

List registered concepts, filtered by minimum confidence.

expand_conceptC

Full description + triangulation_notes for one concept.

concept_manageA

Prune, consolidate, or re-grade concepts — the curator's eviction path.

Concepts are ALL system-generated (there is no foreground/pinned concept class the way lessons/skills have one), so unlike lesson_remove this tool needs no force escape hatch: every concept is fair game for curation. The guard is simply that the target id must exist.

action='remove' — delete one concept (the curator's PRUNE_CONCEPT). action='consolidate' — keep concept_id, fold each id in merge_ids (comma-separated) into it — their triangulation notes carry over, confidence rises to the max, last_evidence_at is bumped — then delete the merged-away rows (CONSOLIDATE_CONCEPT). action='set_confidence' — re-grade concept_id to confidence ∈ {low, medium, high} (a confidence review).

reason is recorded on the event trail for the human audit.

distillA

Mark content as worth carrying forward (distillation channel).

kind ∈ {insight, pattern, anti-pattern, fix, terminology, concept}. confidence ∈ {low, medium, high}. source_thread optional. Other sessions can vote on it via vote_distill; export_distillates emits a curated jsonl bucket.

vote_distillA

Vote on a distillate, weight ∈ [-1, +1]. One vote per cid; re-voting overwrites your previous vote. Updates aggregate vote_sum/vote_count.

pending_distillatesA

List distillates with vote_sum >= min_vote, not yet exported.

export_distillatesA

Write distillates with vote_sum >= min_vote to a jsonl bucket. Marks them exported_at so the same item isn't re-exported next call. Default output: /tmp/thread-keeper-tasks/distillates.jsonl.

core_setA

Upsert a core-memory entry. ALWAYS shown in brief, sorted by priority DESC.

Use sparingly — this is the 'what new-claude must know' surface, not a note store. Good: 'project_root=/Users/.../ai-memory'. Bad: 'today we tried X'. priority 0-100 (higher = shown first). content capped 1KB.

core_removeB

Delete a core-memory entry by key.

core_listA

List all core-memory entries, ordered by priority DESC then key.

core_getB

Return the full content of a single core-memory entry.

linkA

Create a typed edge between two entities.

Kinds: thread, note, concept, distill, task, signal, probe. Relations (suggested, free-form ok): refines, contradicts, exemplifies, depends_on, mentions, elaborates, supersedes.

Existing edge with same (from, to, relation) is replaced (re-linking means updating weight/timestamp, not duplicating).

unlinkC

Remove an edge by id.

neighborsA

BFS the graph from a starting node up to depth hops away. Returns each visited node with its kind, id, and a short content snippet pulled from its native table. Both directions of edges traversed.

tag_signalA

Manually attach a signal to a task. Useful when retroactively building a task-thread (auto-tagging happens at signal-emit time when the cid matches a known spawned_cid).

task_threadB

Replay a spawned task as a chronological thread: every signal tagged with the task_id (or to/from the task's spawned_cid), plus optionally notes added during the task window.

extract_recentB

Scan recent dialog_messages and enqueue heuristic candidates.

H1 user_want → verbatim (normative phrasing) H2 long_insight → distill (assistant ≥500ch + ## headers + conclusion marker) H3 example_regularity→ concept (bullets≥3 OR example-marker≥2 + abstract frame) H4 paraphrase_repeat → note (≥3 msgs cosine ≥0.80 within same session)

review_candidatesB

status ∈ {pending, accepted, rejected, all}. Newest first.

accept_candidateB

Materialize candidate into its target table. target_kind overrides candidate's kind. thread_id optional.

reject_candidateB

Mark rejected. Reason appended to rationale for heuristic tuning.

consolidateA

Periodic memory hygiene. dry_run=True (default) reports only.

merge_dup_notes : intra-thread cosine ≥ note_cosine, keep oldest idle_stale : active threads not touched in stale_days dedupe_verbatim : exact text + (if embeddings) cosine ≥ verbatim_cosine release_orphan : claim ≥ orphan_days old, no progress past claim mark

validate_threadsA

Heuristic triage of active threads — propose (dry_run=True, default) or apply (dry_run=False) close/idle actions per category.

Categories (first match wins): no_notes_old no notes + age ≥ no_notes_days → close shipped last-note shipped-marker + settled → close (outcome=last_move) dropped_open_q last note open_q, unfollowed → close stale_idle no touch ≥ stale_days → idle (not close)

Idle threads are never touched. shipped_markers is a comma-separated list of extra tokens to OR into the default English+Russian regex.

find_invariantsA

Find recurring assistant-side patterns that survive prompt variance.

Algorithm:

  1. Pull recent assistant messages from dialog_messages (with embeddings).

  2. Greedy cluster by response embedding cosine ≥ response_cohesion.

  3. For each cluster (size ≥ min_cluster_size), find each response's immediately-preceding user prompt in the same conversation.

  4. Score = avg_response_similarity × (1 - avg_prompt_similarity). High = my response stays the same shape while prompts vary widely.

Returns top_n clusters with sample response, scores, and counts. Requires semantic embeddings (sentence-transformers) — without them returns ERR.

find_missed_spawnsA

Find assistant responses that decomposed into independent blocks but were answered linearly (no spawn() call nearby).

Algorithm:

  1. Pull recent assistant messages (last window_days days, length ≥ min_response_len, excluding subagent jsonls).

  2. For each, count top-level numbered items and H2/H3 headers.

  3. Mark as decomposable if numbered ≥ min_numbered OR headers ≥ min_headers.

  4. For each decomposable response, check whether any tasks row with parent_cid = response's session_id has started_at within ±10 min of the response. If none → missed_spawn.

  5. Return top top_n by score (numbered + headers).

Use this to calibrate the spawn_hint: a high missed-spawn count means the hint isn't strong enough, or thresholds need tuning.

open_dialog_windowA

Open a Terminal window that tails the live cross-session signal log.

Every broadcast/whisper/question/answer is appended to the log in real time; this lets the user see the dialog between concurrent claude sessions as it happens. The window stays open until you close it (it's a tail -F, no exit). Title: 'thread-keeper-dialog'.

dialog_searchB

Search ingested Claude Code transcripts.

mode='hybrid' (default) combines semantic and FTS5 keyword via RRF. mode='semantic' is pure cosine. mode='fts' is pure FTS5 keyword.

ingestA

Ingest new Claude Code transcripts. Auto-runs on session start; call manually for backfill or after long absence.

pickup_candidatesA

Surface unresolved threads that are stale and unclaimed — candidates for self-initiated pickup when context is free.

Ranks by oldest last_touched_at among active+idle threads with no current claim. Adds a one-line summary so caller can decide which to claim.

claim_pickupA

Claim a thread for self-initiated work. Marks it claimed by my cid.

If auto_spawn=True, immediately spawns a headless child with the thread context (question + recent notes + plan) for parallel work. spawn_role defaults to 'executor' when auto_spawn is on.

release_pickupB

Release a claim. Only the claimant can release.

session_endB

Mark current session ended with optional terse summary.

skill_recordA

Record usage telemetry for a mirrored skill.

kind: 'use' | 'view' | 'patch' | 'create'. Bumps the corresponding counter + timestamp in skill_usage. The curator reads these to decide what to archive.

outcome (optional, meaningful with kind='use'): 'helped' | 'partial' | 'wrong'. When set, also emits an events.kind='skill_outcome' row so the curator can identify skills that fire often but consistently give 'wrong' verdicts — those are false-positive candidates to PRUNE.

The 'wrong' outcome is the primary signal an agent has to say "I consulted this skill, it didn't actually apply / was misleading; please patch or delete next curator pass." Don't be shy about marking 'wrong' — better a curated library than a polluted one.

skill_manageA

Create, edit, patch, or delete skills under the primary skills root.

Atomic primary write with frontmatter validation before disk hits, then best-effort mirror into every configured skill root.

Actions: create — write a brand-new skill. Requires name + description + content (the body markdown WITHOUT frontmatter; the tool prepends a valid frontmatter block). Pass full content starting with '---' to skip the auto-frontmatter and supply your own. edit — overwrite SKILL.md wholesale. Requires name + content (full file including frontmatter). patch — find/replace within SKILL.md. Requires name, old_string, new_string. Result revalidated. write_file — add a support file. Requires name, sub_path (must start with references/, templates/, scripts/, or assets/), content. remove_file — remove a support file under one of the allowed subdirs. Requires name, sub_path. delete — remove a skill entirely. Pinned skills (in skill_usage) are refused.

skill_listC

List skills with telemetry. Format: tier=<hypothesis|observed|validated> origin=<...> state=<active|stale|archived> uses=N fg_uses=N views=N patches=N wrong=N pinned=0/1 last_active=

curator_runA

Move stale agent-created skills to archive.

Lifecycle: active → stale when last activity > stale_after_days stale → archived when last activity > archive_after_days (also moves primary directory to .archive/)

Tier-aware adjustments (the discrete trust signal trumps raw activity): • tier='validated' skills are NEVER stale-aged or archived — proven load-bearing knowledge stays alive regardless of recency. • tier='hypothesis' skills age faster — half the stale_after window (default 15d instead of 30d). Unproven skills don't get to linger. • tier='observed' uses the standard windows.

NEVER touches: • foreground (user-authored) skills — provenance check • pinned skills — opt-out flag • validated tier — proven externally • skills with no created_by_origin set (unknown provenance — be safe)

dry_run=True (default) reports what would change without writing. Set False to apply.

review_threadA

Spawn a background review of a closed thread to extract memory/skills.

Spawns a separate Claude process that reads the thread's notes and writes back via memory/skill tools.

focus: 'memory' | 'skills' | 'combined' (default). Picks the review prompt. mode: 'auto' — spawn an invisible background child with the review prompt + thread notes. Returns the spawn task_id. Child's write-origin is set to 'background_review' so curator can later prune what it produces. 'inline' — return the full prompt + notes context as a string; the foreground agent processes it in the current turn.

dialectic_claimA

Register a new claim about the user. Optionally seed with first piece of evidence — pass the supporting (or contradicting) quote in evidence and set evidence_kind to 'support' (default) or 'contradict'.

domain is free-text; recommended values: 'style','workflow','values','context','skills','other'.

Returns: 'ok id= conf= tier='.

dialectic_evidenceA

Attach evidence to a claim. kind: 'support' | 'contradict'.

source is a freeform pointer like 'thread:T7f3', 'verbatim:42', 'dialog:', or 'manual'. weight ∈ [0,1] is the BASE trust (default 1.0); the effective stored weight is base × discount( WRITE_ORIGIN of the calling session). foreground sessions store weight as-is; shadow/background/candidate/curator forks store weight × 0.5 to prevent self-confirmation loops.

Bumps support_count or contradict_count, recomputes confidence and tier (which may emit a tier_promoted/demoted event).

dialectic_reviewB

List active claims filtered by confidence floor and optional domain. Retired/superseded claims are omitted.

min_confidence: one of 'low','medium','high','disputed'. Note that 'disputed' is treated as its own bucket (not ordered against the others) — passing min_confidence='disputed' returns only disputed.

Format: ' [conf] tier= domain= support=N contradict=N '.

dialectic_synthesisA

Terse rendering of accumulated beliefs about the user, grouped by domain. Used as brief() input. Excludes low/disputed claims and non-active states. Returns at most 12 lines.

Tier markers in the output: ★ validated — load-bearing; act on it without asking · observed — pattern with backing; reference, mention if used ? hypothesis — currently testing (only shown if no observed/validated in same domain, to avoid surfacing weak guesses next to load-bearing facts)

If domain is provided, restricts to that domain (no group headers in that case).

dialectic_supersedeA

Retire old_claim_id and register new_claim that refines or replaces it. The old claim moves to state='superseded' with superseded_by=; its evidence is preserved (not deleted).

If quote is provided, it seeds the new claim with one supporting piece of evidence sourced as 'supersede:'.

If domain is empty, the new claim inherits the old claim's domain.

On hosts that advertise MCP form-mode elicitation, this asks the user to confirm/reject the supersede before mutating. Hosts without elicitation keep the prior behavior and apply immediately.

Returns: 'ok new= old= conf= tier='.

dialectic_observation_resolveA

Mark a dialectic_observations buffer row 'processed' so the validator never re-interprets it. Called by the validator child after it has written (or deliberately skipped) the observation's claims/evidence.

convene_panelA

Spawn a panel of independent agents to vote on a distillate or claim, filling the promotion quorum the way a second human otherwise would.

target_kind: 'distill' (vote via vote_distill) or 'claim' (vote via dialectic_evidence). target_id: Dxxx or UCxxx. size/roles override the configured PANEL_SIZE / PANEL_ROLES.

The panel runs adversarially: with a skeptic present (default), each child's vote carries full weight (panel_vote origin); a panel without a skeptic is discounted so it can't rubber-stamp. Children are fire-and-forget — they vote directly into the DB and aggregates recompute per vote; check pending_distillates() / the dialectic brief afterward.

mp_healthA

Diagnostic snapshot of every running thread-keeper server process on this machine. Shows pid, parent status, RSS, heartbeat age, and whether each is classified as orphaned (parent gone + no fresh heartbeat from its session).

Self (the process answering this call) is always marked is_self=true and never flagged as orphan. Returns structuredContent (MpHealth) plus the legacy text block.

mp_cleanupA

Kill orphaned thread-keeper processes (parent gone AND heartbeat stale for > 5 minutes). Defaults to dry-run — pass dry_run=False to actually send signals. force=True uses SIGKILL instead of SIGTERM.

Never touches the current process or processes whose parent is still alive. Safe to run repeatedly.

memory_guard_statusA

Show memory-guard thresholds and current thread-keeper RSS rows.

memory_guard_checkA

Run one memory-guard pass now.

Defaults to dry-run and no desktop notification. Pass dry_run=False to SIGTERM thread-keeper server processes over the kill threshold.

memory_guard_reclaimA

Unload thread-keeper model/caches now.

scope: self trims this MCP process immediately. all also queues trim requests for peer thread-keeper server processes; peers handle the request on their next guard tick.

shadow_review_runA

Fire one shadow-review pass.

force=True runs even when the daemon is disabled (interval=0). Used by tests and one-shot triage.

dry_run=True short-circuits before the spawn — returns the dialog dump that WOULD be evaluated, plus n_chars and high-water cursor. No spawn. No cursor advance. Use this to inspect candidate windows before paying for an evaluator child.

shadow_review_statusB

Show shadow-review config, recent passes, and production telemetry.

Snapshot for sanity-checking that the daemon is alive and advancing its cursor, PLUS the production-validation rollup (issue #6): for the 24h and 7d windows it aggregates how often the daemon fired, the outcome mix (no_window / too_short / spawned / deferred / error), the MATERIALIZED-vs-SKIP hit rate of spawned evaluator children, durable skill writes attributable to shadow_review, and the total Claude-spawn time spent — so you can tell whether the loop earns its Opus minutes or just emits SKIPs.

snapshot_path: when set, also writes a markdown report to that path for human review (the side-channel snapshot).

lesson_appendA

Materialize a class-level lesson into ~/.threadkeeper/lessons.md.

title is sluggified to a stable key — repeated calls with the same title overwrite the existing section (idempotent).

body is markdown; goes verbatim into the section body.

summary is an optional one-liner rendered as a blockquote right after the header. Use when the body is long and a TL;DR helps the next agent decide whether to read further.

source is a provenance tag — typically a thread id ("Tabc123") when written by review_thread, or "shadow" when written by the shadow_review observer. Empty is fine.

lesson_listB

Compact listing of materialized lessons, newest first.

Format per line: <age> <slug> source=<src> <first 60 chars of body>

lesson_getA

Return the full body of one lesson by slug. Useful when lesson_list surfaced something you want to read in full.

lesson_removeA

Remove one materialized lesson section by slug.

Refuses source=foreground / source=user lessons unless force=True. Curator/evolve cleanup should never pass force; it exists only for an explicit human-initiated correction.

curator_reviewA

Fire one curator pass.

force=True runs even when CURATOR_INTERVAL_S=0 (daemon disabled). Use for one-shot triage or testing the prompt.

dry_run=True short-circuits before the spawn — returns the inventory that WOULD be passed plus n_lessons/n_skills. No spawn, no cursor advance. Use to inspect what the curator child would see before paying for the spawn.

curator_review_statusA

Show curator configuration + last 5 passes + latest REPORT path.

Sanity-check for whether the daemon is alive, advancing the cursor, and producing REPORTs the user can read.

candidate_review_runA

Fire one candidate-review pass.

force=True runs even when CANDIDATE_REVIEW_INTERVAL_S=0 (daemon disabled).

dry_run=True short-circuits before the spawn — returns the inventory and pending count. No spawn, no cursor advance.

candidate_review_statusA

Show candidate-reviewer configuration + last 5 passes + current pending queue size.

evolve_applyA

Implement a PROMOTED + not-yet-applied format-evolution suggestion.

Spawns an evolve_applier child that: edits render_brief() in threadkeeper/brief.py to make the change; adds/extends a GOLDEN test asserting the new behavior appears AND the existing brief still renders; runs the FULL suite (.venv/bin/python -m pytest -q) until green; then opens a PULL REQUEST on a feature branch via gh — it NEVER pushes or commits to main (a human reviews + merges).

applied=1 is set ONLY when the child reports a real PR back via evolve_mark_applied — opening the PR is the autonomy gate.

Rejects ids that don't exist or aren't promoted+unapplied. Single-flight: refuses while another applier child is in flight. Returns a status line (spawned … / applier_running … / ERR …). Get ids from evolve_review().

evolve_apply_curator_reportA

Apply a Curator advisory report using the existing evolve_applier role.

With no report_path, picks the latest complete REPORT-*.md in THREADKEEPER_CURATOR_REPORTS_DIR that has not already been marked applied. Single-flight: refuses while any evolve_applier child is in flight. The child may patch/delete memory through curated MCP tools, but does not edit code, use git, or open a PR.

evolve_apply_roadmap_issueA

Implement one open GitHub issue through the evolve_applier role.

With issue_number=0, picks the next open issue: roadmap-labeled issues first, then FIFO by issue number. The child implements exactly one issue, runs the suite, opens a PR with Closes #N, then calls evolve_mark_roadmap_issue_applied(issue_number, pr_url).

Prompts

Interactive templates invoked by user choice

NameDescription
review_recent_threadsSummarize and triage the most recent thread-keeper threads (open / idle / recently closed) and propose next moves.
run_library_curationFire one Curator audit pass over the lessons + skills library (KEEP / PATCH / CONSOLIDATE / PRUNE recommendations).
audit_threadkeeperWhole-system health + hygiene audit: telemetry rollup, thread integrity, and pending evolve suggestions.

Resources

Contextual data attached and managed by the client

NameDescription
briefSession-start memory brief: core memory, open/idle/closed threads, live peers, style, verbatim, user-model. Read-only, rendered lean (no behavioral nudges, no side effects). Mirrors the brief() tool.
contextRuntime context: session id, age, semantic on/off, db path, thread counts. Read-only. Mirrors the context() tool's text block.
dashboardOne-call rollup: store sizes, autonomous-loop fire counts, and what those loops produced. Read-only. Mirrors the mp_dashboard() tool.
agent-statusAutonomous learning loops: state, backlog, last pass, RSS. Read-only cached snapshot (refresh=False, no process re-scan). Mirrors the agent_status() tool's formatted summary.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/po4erk91/thread-keeper'

If you have feedback or need assistance with the MCP directory API, please join our Discord server