# MyAIGist MCP Testing Guide
Complete testing checklist for all 13 MCP tools with example conversations and expected behaviors.
## Pre-Testing Setup
1. **Verify Installation:**
```bash
cd /Users/mikeschwimmer/myaigist_mcp
python3 -m py_compile server.py
python3 -c "from mcp_agents.qa_agent import QAAgent; print('✅ Ready')"
```
2. **Check Environment:**
```bash
cat .env | grep OPENAI_API_KEY
# Should show: OPENAI_API_KEY=sk-...
```
3. **Configure Claude Desktop:**
- Edit `~/Library/Application Support/Claude/claude_desktop_config.json`
- Add myaigist MCP server configuration
- Restart Claude Desktop
4. **Prepare Test Files:**
```bash
# Create test directory
mkdir -p ~/myaigist_test_files
# You'll need:
# - test.pdf (any PDF document)
# - test.txt (text file)
# - test.mp3 (audio file)
# - test.mp4 (video file)
```
## Test 1: Content Processing Tools
### Test 1.1: process_document (PDF)
**Test Case:** Process a PDF document with standard summary
**In Claude Desktop:**
```
Upload test.pdf and say:
"Process this document with a standard summary"
```
**Expected Behavior:**
- ✅ Document is processed
- ✅ Summary is generated (3-5 paragraphs for standard level)
- ✅ Audio URI is returned: `myaigist://audio/speech_*.mp3`
- ✅ Document ID (UUID) is returned
- ✅ Knowledge base shows 1 document
- ✅ Click-to-play audio button appears in Claude
**Success Criteria:**
- Summary is coherent and captures main points
- Audio plays when clicked
- Document is stored (verify with `list_documents`)
### Test 1.2: process_text
**Test Case:** Process raw text
**In Claude Desktop:**
```
"Process this text with a quick summary:
[paste a few paragraphs of text]"
```
**Expected Behavior:**
- ✅ Text is processed
- ✅ Quick summary is brief (1-2 paragraphs)
- ✅ Audio URI returned
- ✅ Document ID returned
- ✅ Knowledge base now shows 2 documents
**Success Criteria:**
- Quick summary is shorter than standard
- Text is searchable via Q&A
### Test 1.3: process_url
**Test Case:** Crawl and process a web page
**In Claude Desktop:**
```
"Process https://en.wikipedia.org/wiki/Artificial_intelligence
with a detailed summary"
```
**Expected Behavior:**
- ✅ URL is crawled successfully
- ✅ Content is extracted
- ✅ Detailed summary is comprehensive (5+ paragraphs)
- ✅ Audio URI returned
- ✅ Page title is captured
- ✅ Knowledge base shows 3 documents
**Success Criteria:**
- Summary covers main topics from page
- Links and navigation are filtered out
- Content is accurate
### Test 1.4: process_media (Audio)
**Test Case:** Transcribe audio file
**In Claude Desktop:**
```
Upload test.mp3 and say:
"Transcribe this audio file"
```
**Expected Behavior:**
- ✅ Audio is transcribed using Whisper
- ✅ Full transcript is returned
- ✅ Summary of transcript is generated
- ✅ Audio URI returned (for summary)
- ✅ Transcript is stored in knowledge base
**Success Criteria:**
- Transcript is accurate
- Summary captures key points from audio
- Transcript is searchable
### Test 1.5: process_media (Video)
**Test Case:** Transcribe video file
**In Claude Desktop:**
```
Upload test.mp4 and say:
"Transcribe this video"
```
**Expected Behavior:**
- ✅ Audio is extracted from video
- ✅ Audio is transcribed
- ✅ Transcript and summary returned
- ✅ Video format is handled correctly
**Success Criteria:**
- Works same as audio transcription
- Handles various video formats (MP4, MOV, etc.)
### Test 1.6: process_batch
**Test Case:** Process multiple files with unified summary
**In Claude Desktop:**
```
"Process these files together:
- /Users/[username]/test1.pdf
- /Users/[username]/test2.txt
- /Users/[username]/test3.pdf
Give me a unified summary."
```
**Expected Behavior:**
- ✅ All files processed individually
- ✅ Individual summaries generated
- ✅ Unified cross-document summary generated
- ✅ Audio URI for unified summary
- ✅ All documents stored separately
- ✅ Success/failure status for each file
**Success Criteria:**
- Unified summary synthesizes themes across documents
- Individual summaries are accurate
- Failed files don't break the batch
## Test 2: Q&A System
### Test 2.1: ask_question (Basic)
**Test Case:** Ask simple factual question
**Prerequisites:** Run Test 1.1 first (need at least one document)
**In Claude Desktop:**
```
"What is the main topic of the first document I uploaded?"
```
**Expected Behavior:**
- ✅ Question is processed
- ✅ Relevant context is retrieved from vector store
- ✅ Answer is accurate and specific
- ✅ Audio URI returned
- ✅ Audio answer plays correctly
**Success Criteria:**
- Answer directly addresses question
- Answer cites information from document
- Audio pronunciation is clear
### Test 2.2: ask_question (Complex)
**Test Case:** Ask multi-document question
**Prerequisites:** Have 3+ documents in knowledge base
**In Claude Desktop:**
```
"What are the common themes across all my documents?"
```
**Expected Behavior:**
- ✅ Searches across all documents
- ✅ Synthesizes information
- ✅ Answer references multiple documents
- ✅ Audio response generated
**Success Criteria:**
- Answer demonstrates cross-document understanding
- Specific examples from different documents
- Coherent synthesis
### Test 2.3: ask_question_voice
**Test Case:** Voice question processing
**In Claude Desktop:**
```
Upload audio file with question like "What is the summary of document X?"
Then say:
"Answer the question in this audio file"
```
**Expected Behavior:**
- ✅ Voice question is transcribed
- ✅ Transcribed question is shown
- ✅ Answer is generated based on transcription
- ✅ Audio answer is returned
- ✅ Round-trip voice interaction works
**Success Criteria:**
- Question transcription is accurate
- Answer addresses transcribed question
- Audio response is clear
## Test 3: Document Management
### Test 3.1: list_documents
**Test Case:** List all stored documents
**Prerequisites:** Have 2+ documents in knowledge base
**In Claude Desktop:**
```
"Show me all my documents"
```
**Expected Behavior:**
- ✅ Returns list of all documents
- ✅ Each document shows:
- doc_id (UUID)
- title
- upload_time (ISO format)
- chunk_count
- ✅ Total document count matches actual
- ✅ Knowledge base stats included
**Success Criteria:**
- All previously uploaded documents are listed
- Metadata is accurate
- UUIDs are unique
### Test 3.2: delete_document
**Test Case:** Delete specific document
**Prerequisites:** Have document with known doc_id
**In Claude Desktop:**
```
"Delete document with ID [paste doc_id from list_documents]"
```
**Expected Behavior:**
- ✅ Document is removed from knowledge base
- ✅ Vector store is updated
- ✅ Success confirmation returned
- ✅ Updated document count is shown
- ✅ Subsequent Q&A doesn't include deleted document
**Success Criteria:**
- Document is fully removed
- Other documents remain intact
- Vector store file is updated on disk
### Test 3.3: clear_all_documents
**Test Case:** Clear entire knowledge base
**In Claude Desktop:**
```
"Clear all my documents"
```
**Expected Behavior:**
- ✅ All documents removed
- ✅ Vector store cleared
- ✅ Document count = 0
- ✅ Chunk count = 0
- ✅ Subsequent Q&A says "No documents uploaded"
**Success Criteria:**
- Knowledge base is empty
- Fresh start possible
- No orphaned data
## Test 4: Utility Tools
### Test 4.1: generate_audio
**Test Case:** Generate TTS audio from text
**In Claude Desktop:**
```
"Generate audio with the nova voice for this text:
Hello, this is a test of the MyAIGist text to speech system."
```
**Expected Behavior:**
- ✅ Audio is generated
- ✅ Audio URI returned
- ✅ Specified voice is used (nova)
- ✅ Audio file created in audio/ directory
- ✅ Click-to-play button appears
**Success Criteria:**
- Audio quality is good
- Voice matches requested voice
- Pronunciation is clear
### Test 4.2: generate_audio (All Voices)
**Test Case:** Test all available voices
**In Claude Desktop:**
```
For each voice (alloy, echo, fable, onyx, nova, shimmer):
"Generate audio with the [voice] voice: This is a voice test"
```
**Expected Behavior:**
- ✅ All 6 voices work
- ✅ Each voice sounds distinct
- ✅ Audio URIs returned for each
**Success Criteria:**
- No errors for any voice
- Clear differences between voices
- All audio files playable
### Test 4.3: cleanup_audio
**Test Case:** Clean up old audio files
**In Claude Desktop:**
```
"Clean up audio files older than 1 hour"
```
**Expected Behavior:**
- ✅ Scans audio/ directory
- ✅ Deletes files older than specified age
- ✅ Returns count of cleaned files
- ✅ Returns space freed (MB)
- ✅ Recent files are preserved
**Success Criteria:**
- Old files are removed
- Recent files remain
- Space is freed
### Test 4.4: get_status
**Test Case:** Get system status
**In Claude Desktop:**
```
"What's my system status?"
```
**Expected Behavior:**
- ✅ Knowledge base statistics:
- documents_count
- chunks_count
- vectors_ready (true/false)
- ready_for_questions (true/false)
- embedding_dimension
- memory_usage_mb
- ✅ Audio files statistics:
- count
- total_size_mb
- ✅ Supported formats listed
- ✅ Available voices listed
**Success Criteria:**
- All stats are accurate
- Reflects current system state
- Numbers match actual files
## Test 5: Integration Tests
### Test 5.1: Full Document Workflow
**Complete conversation:**
```
User: "Process ~/Documents/research.pdf"
Claude: [Uses process_document] ✅ Document processed, here's the summary...
User: "What is the main conclusion?"
Claude: [Uses ask_question] "The main conclusion is..."
User: "Show me all my documents"
Claude: [Uses list_documents] You have 1 document...
User: "Delete that document"
Claude: [Uses delete_document] ✅ Deleted successfully
```
**Success Criteria:**
- Entire workflow works smoothly
- Context is maintained across turns
- All tools work together
### Test 5.2: Multi-Document Research
**Complete conversation:**
```
User: "Process these 3 papers: paper1.pdf, paper2.pdf, paper3.pdf
Give me a unified summary."
Claude: [Uses process_batch] ✅ Processed all 3...
User: "What are the common methodologies?"
Claude: [Uses ask_question] "The common methodologies are..."
User: "Compare the results across all three"
Claude: [Uses ask_question] "Paper 1 found X, Paper 2 found Y..."
User: "Which paper had the highest sample size?"
Claude: [Uses ask_question] "Paper 2 had 500 participants..."
```
**Success Criteria:**
- Cross-document queries work
- Comparisons are accurate
- Specific facts can be retrieved
### Test 5.3: Media Pipeline
**Complete conversation:**
```
User: "Transcribe ~/Videos/interview.mp4"
Claude: [Uses process_media] ✅ Transcribed: [full transcript]
User: "Summarize the key points"
Claude: [Uses ask_question] "The key points are..."
User: "Who was interviewed?"
Claude: [Uses ask_question] "Based on the transcript..."
```
**Success Criteria:**
- Video transcription accurate
- Q&A works on transcript
## Test 6: Error Handling
### Test 6.1: Missing File
**In Claude Desktop:**
```
"Process /nonexistent/file.pdf"
```
**Expected Behavior:**
- ❌ Error: File not found at /nonexistent/file.pdf
- ✅ Graceful error message
- ✅ No crash
### Test 6.2: Invalid Format
**In Claude Desktop:**
```
"Process ~/Documents/image.jpg"
```
**Expected Behavior:**
- ❌ Error: Unsupported format
- ✅ Lists supported formats
- ✅ No crash
### Test 6.3: Empty Document
**Create empty file:**
```bash
touch ~/empty.txt
```
**In Claude Desktop:**
```
"Process ~/empty.txt"
```
**Expected Behavior:**
- ❌ Error: Document appears to be empty
- ✅ Graceful error message
### Test 6.4: Question Without Documents
**Prerequisites:** Empty knowledge base
**In Claude Desktop:**
```
"What is the capital of France?"
```
**Expected Behavior:**
- ❌ No documents have been uploaded yet
- ✅ Prompt to upload documents first
## Test 7: Performance Tests
### Test 7.1: Large Document
**Test Case:** Process 50+ page PDF
**Expected:**
- ✅ Completes within 2-3 minutes
- ✅ Summary is coherent
- ✅ Chunking works correctly
- ✅ Q&A is responsive
### Test 7.2: Many Documents
**Test Case:** 20+ documents in knowledge base
**Expected:**
- ✅ list_documents returns all
- ✅ Q&A still works
- ✅ Search across all documents
- ✅ Reasonable response time (<10s)
### Test 7.3: Long Audio
**Test Case:** 1+ hour audio/video file
**Expected:**
- ✅ Transcription completes
- ✅ Transcript is accurate throughout
- ✅ Summary captures full content
- ✅ No truncation
## Test 8: Persistence Tests
### Test 8.1: Restart Persistence
**Test Steps:**
1. Upload document
2. Restart Claude Desktop
3. Ask question about document
**Expected:**
- ✅ Document is still in knowledge base
- ✅ Q&A works after restart
- ✅ Vector store loaded correctly
## Success Checklist
- [ ] All 11 tools work individually
- [ ] Vector storage persists across restarts
- [ ] Multi-document Q&A works
- [ ] Error handling is graceful
- [ ] Performance is acceptable
- [ ] Documentation is accurate
- [ ] All test workflows pass
## Reporting Issues
If tests fail:
1. **Check logs:**
```bash
tail -f ~/.config/claude/logs/mcp.log
```
2. **Verify environment:**
```bash
python3 -c "import os; print(os.getenv('OPENAI_API_KEY')[:10])"
```
3. **Test imports:**
```bash
python3 -c "from mcp_agents.qa_agent import QAAgent; print('OK')"
```
4. **Check vector store:**
```bash
ls -lh data/vector_store.pkl
```
5. **Check audio directory:**
```bash
ls -lh audio/
```
---
**Testing Completed:** [Date]
**All Tests Pass:** ✅/❌
**Notes:** [Add any observations or issues]