Skip to main content
Glama

Self-Improving Memory MCP

by SuperPiTT
CHECKPOINT-TESTING.md10 kB
# Checkpoint System Testing Guide This guide explains how to verify that the automatic checkpoint system is working correctly. ## System Overview The checkpoint system prevents Claude's lossy autocompact by: 1. **Monitoring** context usage continuously 2. **Triggering** checkpoint at 80% usage (160k/200k tokens) 3. **Saving** complete session state to memory 4. **Recovering** state automatically in new conversations ## Test Scenarios ### Test 1: Token Threshold Detection **Goal**: Verify that checkpoint triggers at 80% context usage **Steps**: 1. Start conversation 2. Perform token-heavy operations (read many large files, long discussions) 3. Monitor `<budget:token_budget>` in responses 4. **Expected**: At ~160k tokens, Claude should: - Alert: "🚨 CONTEXT CHECKPOINT TRIGGERED" - Launch Pre-Compact Interceptor Agent - Save session state - Provide continuation summary **Success Criteria**: - ✅ Alert appears at 80%+ usage - ✅ Agent launches automatically (no user prompt needed) - ✅ Checkpoint entity created in memory - ✅ Continuation summary provided **Failure Indicators**: - ❌ No alert at 80%+ - ❌ Context reaches 95%+ without checkpoint - ❌ Autocompact happens (conversation summarized without checkpoint) --- ### Test 2: Message Count Threshold **Goal**: Verify that checkpoint triggers at 40+ message exchanges **Steps**: 1. Start conversation 2. Have back-and-forth exchanges (40+ messages) 3. **Expected**: At message 40, Claude should trigger checkpoint **Success Criteria**: - ✅ Checkpoint triggers at message count threshold - ✅ Works independently of token count --- ### Test 3: Manual Checkpoint **Goal**: Verify manual checkpoint command works **Steps**: 1. In any conversation, say: "save state" or "checkpoint" 2. **Expected**: Claude immediately launches Pre-Compact Interceptor **Success Criteria**: - ✅ Checkpoint activates on command - ✅ State saved even if usage < 80% --- ### Test 4: Context Recovery (Proactive) **Goal**: Verify automatic checkpoint recovery in new conversations **Steps**: 1. Complete Test 1 (create checkpoint) 2. Start NEW conversation 3. Send simple first message: "Hello" 4. **Expected**: Context Recovery Agent should: - Search for recent checkpoints - Find the checkpoint from Test 1 - Present recovery option: "💡 Found incomplete work from X hours ago..." **Success Criteria**: - ✅ Agent launches automatically (without user asking) - ✅ Checkpoint detected and presented - ✅ User can choose to resume or start fresh **Failure Indicators**: - ❌ Agent doesn't launch - ❌ Checkpoint not found - ❌ Claude asks "what were we working on?" --- ### Test 5: Context Recovery (Explicit) **Goal**: Verify checkpoint recovery from continuation summary **Steps**: 1. Complete Test 1 (create checkpoint and get continuation summary) 2. Start NEW conversation 3. Paste the continuation summary 4. **Expected**: Claude should: - Detect checkpoint format - Launch Context Recovery Agent - Load complete state - Resume work immediately **Success Criteria**: - ✅ Checkpoint recognized from summary - ✅ Complete context loaded - ✅ Work continues exactly where left off - ✅ No redundant questions asked --- ### Test 6: End-to-End Flow **Goal**: Test complete checkpoint → recovery → continuation cycle **Steps**: 1. Start working on multi-step task 2. Reach 80% context usage (trigger checkpoint) 3. Receive continuation summary 4. Start NEW conversation 5. Paste continuation summary 6. Verify work continues seamlessly **Success Criteria**: - ✅ All decisions remembered - ✅ All files/changes understood - ✅ User preferences applied - ✅ Next steps clear and correct - ✅ No information lost --- ## Verification Checklist After running tests, verify in knowledge base: ### Checkpoint Entities Created ```bash # Search for checkpoints memory-cli search "checkpoint" ``` **Expected**: - Entities with type: "session-snapshot" - Observations include: - CHECKPOINT METADATA (timestamp, token count, message count) - CURRENT WORK (goal, topic, progress) - COMPLETED (files, decisions, solutions) - PENDING (next steps, remaining tasks) - KEY CONTEXT (decisions, approaches, preferences) - RESUME INSTRUCTIONS ### Checkpoint Quality Check if checkpoint contains: - ✅ Clear goal/topic - ✅ Specific file paths - ✅ Exact next step - ✅ Key decisions with reasoning - ✅ User preferences ### Recovery Quality After recovery, verify: - ✅ No questions re-asking completed work - ✅ All context applied automatically - ✅ Work continues from exact point - ✅ Style/preferences maintained --- ## Common Issues & Fixes ### Issue 1: Checkpoint Not Triggering **Symptoms**: - Context reaches 90%+ without checkpoint - No alert at 80% threshold **Diagnosis**: 1. Check if CLAUDE.md is being read (it should be in system reminders) 2. Check token count manually in `<budget:token_budget>` 3. Verify Pre-Compact Interceptor agent exists **Fix**: - Ensure `.claude/CLAUDE.md` has the mandatory token check section - Verify agent file: `.claude/agents/pre-compact-interceptor.md` --- ### Issue 2: Context Recovery Not Working **Symptoms**: - New conversation doesn't detect checkpoint - Agent doesn't launch proactively **Diagnosis**: 1. Check if checkpoint was actually created (search memory) 2. Check if Context Recovery agent exists 3. Check checkpoint timestamp (must be < 24 hours) **Fix**: - Ensure `.claude/agents/context-recovery.md` exists - Check checkpoint status is "in-progress" not "completed" - Verify memory search is working: `memory-cli search "checkpoint"` --- ### Issue 3: Incomplete Checkpoint **Symptoms**: - Checkpoint created but missing information - Recovery doesn't have enough context **Diagnosis**: 1. Read the checkpoint entity 2. Check which sections are missing **Fix**: - Update Pre-Compact Interceptor agent instructions - Add missing sections to checkpoint format - Test with longer conversation to capture more context --- ### Issue 4: Checkpoint Too Late **Symptoms**: - Autocompact happens before checkpoint - Information already lost **Diagnosis**: - Check at what % checkpoint triggered - Was it >= 95%? **Fix**: - Lower threshold to 75% (from 80%) in CLAUDE.md - Add more aggressive monitoring - Trigger before large operations --- ## Performance Benchmarks ### Expected Behavior | Metric | Target | Acceptable | Poor | |--------|--------|------------|------| | Checkpoint trigger time | 160k tokens (80%) | 170k tokens (85%) | 190k+ tokens (95%+) | | Checkpoint save time | < 30 seconds | < 60 seconds | > 60 seconds | | Recovery detection time | First message | Second message | Not automatic | | Context completeness | 100% | 95%+ | < 95% | ### Measurement **Checkpoint Trigger**: ``` Note the token count when alert appears Expected: ~160,000 / 200,000 ``` **Checkpoint Completeness**: ``` Count sections in checkpoint entity: - METADATA: ✓ - CURRENT WORK: ✓ - COMPLETED: ✓ - PENDING: ✓ - KEY CONTEXT: ✓ - RESUME INSTRUCTIONS: ✓ Score: 6/6 = 100% ``` **Recovery Success**: ``` After recovery, can you answer these without asking user? - What was the goal? YES/NO - What was completed? YES/NO - What's next? YES/NO - What decisions were made? YES/NO - What are user preferences? YES/NO Score: 5/5 = 100% ``` --- ## Debugging Commands ### Check for Checkpoints ```bash # Search all checkpoints memory-cli search "checkpoint" # Get latest checkpoint memory-cli search "checkpoint" | head -n 1 # Check checkpoint status memory-cli search "session-snapshot" ``` ### Verify Checkpoint Content ```bash # Export checkpoint to see full content memory-cli export # Look for session-snapshot sections ``` ### Test Memory Search ```bash # Verify memory search is working memory-cli stats # Check if search returns results memory-cli search "test" ``` --- ## Success Metrics The checkpoint system is working correctly when: 1. **Prevention**: - ✅ Checkpoint triggers at 80% usage - ✅ BEFORE autocompact can discard information - ✅ Continuation summary is clear and complete 2. **Recovery**: - ✅ New conversations detect checkpoints automatically - ✅ Context loads without user prompting - ✅ Work continues seamlessly 3. **Quality**: - ✅ No information loss across conversations - ✅ No redundant questions - ✅ User doesn't need to re-explain anything 4. **Experience**: - ✅ User trusts the system - ✅ Process is transparent - ✅ Recovery feels natural --- ## Manual Testing Protocol ### Test Session 1: Create Checkpoint 1. Start conversation 2. Work on complex multi-file task 3. Continue until 160k+ tokens 4. Verify checkpoint triggers 5. Save continuation summary 6. Note checkpoint details ### Test Session 2: Verify Recovery 1. Wait 5 minutes (simulate context loss) 2. Start NEW conversation 3. First message: "Hello" 4. Verify automatic recovery offer 5. Accept recovery 6. Verify complete context loads 7. Continue work without re-explaining ### Test Session 3: Explicit Resume 1. Start NEW conversation 2. Paste continuation summary from Session 1 3. Verify immediate recognition 4. Verify context loads 5. Verify work continues correctly --- ## Reporting Issues If checkpoint system fails, report: 1. **What happened**: - Token count at failure - Message count - Error messages (if any) 2. **Expected behavior**: - What should have happened 3. **Context**: - Conversation topic - Files being worked on - Operations performed 4. **Logs**: - Checkpoint entity (if created) - Memory search results - System configuration --- ## Next Steps After testing: 1. ✅ Verify all tests pass 2. ✅ Document any issues found 3. ✅ Adjust thresholds if needed 4. ✅ Update agent instructions based on learnings 5. ✅ Add to knowledge base as pattern **The checkpoint system is mission-critical. It must work 100% of the time.**

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/SuperPiTT/self-improving-memory-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server