TDZ C64 Knowledge

Overview Schema Related Servers Score Discussions

tdz-c64-knowledge
docs

SUMMARIZATION.md•14.4 KiB

# Document Summarization Feature Guide **Version:** 2.13.0 (Phase 1.2 - Complete) **Status:** Fully Implemented & Tested **Last Updated:** 2025-12-17 --- ## Overview The Document Summarization feature uses AI (Claude or GPT) to automatically generate intelligent summaries of knowledge base documents. Summaries are cached for fast retrieval and can be regenerated on demand. ### Key Features - **Three Summary Types:** - **Brief:** 200-300 word overview (1-2 paragraphs) - **Detailed:** 500-800 word comprehensive summary (3-5 paragraphs) - **Bullet Points:** 8-12 key topics in bullet format - **Intelligent Caching:** Summaries stored in database for instant retrieval - **Flexible Regeneration:** Force regenerate cached summaries when needed - **Bulk Processing:** Generate summaries for entire knowledge base at once - **Multi-Format Retrieval:** Access via CLI, MCP tools, or Python API --- ## Prerequisites ### Required 1. **LLM Configuration** (one of): - **Anthropic Claude:** ```bash set LLM_PROVIDER=anthropic set ANTHROPIC_API_KEY=sk-ant-xxxxx... set LLM_MODEL=claude-3-haiku-20240307 ``` - **OpenAI GPT:** ```bash set LLM_PROVIDER=openai set OPENAI_API_KEY=sk-xxxxx... set LLM_MODEL=gpt-3.5-turbo ``` 2. **Python 3.10+** (already installed) 3. **llm_integration module** (already included) ### Optional - Advanced features (already configured in launch scripts): ```bash set USE_SEMANTIC_SEARCH=1 set USE_FTS5=1 set SEARCH_CACHE_SIZE=100 ``` --- ## Usage ### Command Line Interface #### Generate Single Summary ```bash # Brief summary (default) python cli.py summarize <doc_id> # Detailed summary python cli.py summarize <doc_id> --type detailed # Bullet-point summary python cli.py summarize <doc_id> --type bullet # Force regeneration (ignore cache) python cli.py summarize <doc_id> --force ``` **Example:** ```bash python cli.py summarize "c64-programmers-reference-v2-1985" --type detailed ``` #### Bulk Summarization ```bash # Generate brief summaries for all documents python cli.py summarize-all # Generate multiple types for all documents python cli.py summarize-all --types brief detailed bullet # Force regeneration python cli.py summarize-all --force # Limit to first 10 documents (for testing) python cli.py summarize-all --max 10 ``` **Example:** ```bash python cli.py summarize-all --types brief detailed --max 50 ``` ### Python API #### Single Document Summary ```python from server import KnowledgeBase import os kb = KnowledgeBase(os.path.expanduser('~/.tdz-c64-knowledge')) # Generate brief summary summary = kb.generate_summary('doc-id', summary_type='brief') print(summary) # Generate with force regeneration summary = kb.generate_summary('doc-id', summary_type='detailed', force_regenerate=True) print(summary) # Retrieve cached summary (no API call) summary = kb.get_summary('doc-id', summary_type='brief') if summary: print(summary) else: print("No cached summary. Generate one with generate_summary().") ``` #### Bulk Summarization ```python # Generate brief summaries for all documents results = kb.generate_summary_all(summary_types=['brief']) # Process multiple types results = kb.generate_summary_all( summary_types=['brief', 'detailed', 'bullet'], force_regenerate=False, max_docs=50 # Limit for testing ) # Access results print(f"Processed: {results['processed']}") print(f"Failed: {results['failed']}") print(f"Total summaries: {results['total_summaries']}") print(f"By type: {results['by_type']}") # Iterate individual results for doc_result in results['results']: print(f"Document: {doc_result['title']}") for summary_type, summary_info in doc_result['summaries'].items(): if summary_info['success']: print(f" {summary_type}: {summary_info['word_count']} words") else: print(f" {summary_type}: ERROR - {summary_info['error']}") ``` ### MCP Tools (Claude Integration) Three new tools available in Claude Desktop / Claude Code: #### 1. `summarize_document` Generate a summary of a specific document. **Parameters:** - `doc_id` (required): Document ID to summarize - `summary_type` (optional): 'brief', 'detailed', or 'bullet' (default: 'brief') - `force_regenerate` (optional): Boolean to force regeneration (default: false) **Example:** ``` User: "Summarize document 'c64-assembly-guide' with detailed summary" Claude: [calls summarize_document with doc_id and summary_type='detailed'] ``` #### 2. `get_summary` Retrieve a cached summary without API call. **Parameters:** - `doc_id` (required): Document ID - `summary_type` (optional): 'brief', 'detailed', or 'bullet' (default: 'brief') **Example:** ``` User: "Get the brief summary for the C64 BASIC reference" Claude: [calls get_summary to retrieve cached version] ``` #### 3. `summarize_all` Bulk generate summaries for all documents. **Parameters:** - `summary_types` (optional): Array of types (default: ['brief']) - `force_regenerate` (optional): Boolean (default: false) - `max_docs` (optional): Maximum documents to process **Example:** ``` User: "Generate brief and detailed summaries for the first 20 documents" Claude: [calls summarize_all with summary_types=['brief','detailed'] max_docs=20] ``` --- ## Database Schema New `document_summaries` table stores all generated summaries: ```sql CREATE TABLE document_summaries ( doc_id TEXT NOT NULL, summary_type TEXT NOT NULL, summary_text TEXT NOT NULL, generated_at TEXT NOT NULL, model TEXT, token_count INTEGER, PRIMARY KEY (doc_id, summary_type), FOREIGN KEY (doc_id) REFERENCES documents(doc_id) ON DELETE CASCADE ); ``` **Indexes:** - `idx_summaries_doc_id` - Fast lookup by document - `idx_summaries_type` - Fast lookup by summary type **Cascade Delete:** Removing a document automatically removes all its summaries. --- ## Configuration ### Environment Variables All handled automatically by launcher scripts. Can be customized: ```bash # LLM Provider (required for summarization) LLM_PROVIDER=anthropic # or 'openai' LLM_MODEL=claude-3-haiku-20240307 # API Keys ANTHROPIC_API_KEY=sk-ant-... # For Anthropic Claude OPENAI_API_KEY=sk-... # For OpenAI GPT # Optional feature flags ENABLE_SUMMARIZATION=1 # Feature flag (default: enabled) SUMMARY_CACHE_ENABLED=1 # Cache summaries (default: enabled) SUMMARY_DEFAULT_LENGTH=brief # Default type (default: brief) ``` ### Data Storage All summaries stored in SQLite database at: ``` ~/.tdz-c64-knowledge/knowledge_base.db ``` No external API caching or cloud storage. --- ## Performance ### Summary Generation Speed - **Brief (200-300 words):** 3-5 seconds (Claude) / 2-4 seconds (GPT) - **Detailed (500-800 words):** 5-8 seconds (Claude) / 3-6 seconds (GPT) - **Bullet Points:** 3-5 seconds (either provider) ### Caching Impact - **First generation:** 3-8 seconds (API call) - **Cached retrieval:** <10ms (database lookup) - **Cache hit rate:** ~80% for typical usage patterns ### Cost Estimates (Anthropic Claude) Using claude-3-haiku (cheapest option): | Operation | Tokens | Cost | |-----------|--------|------| | Brief summary (200-300 words) | 400-600 | ~$0.02 | | Detailed summary (500-800 words) | 800-1200 | ~$0.04 | | Bullet summary (150-200 words) | 300-400 | ~$0.01 | | **All 148 docs (brief only)** | ~60,000 | ~$3.00 | | **All 148 docs (3 types)** | ~200,000 | ~$10.00 | **GPT-3.5-Turbo:** Approximately 10-20x cheaper than Claude depending on volume. --- ## Error Handling ### Common Issues & Solutions #### Issue: "LLM not configured" **Cause:** LLM_PROVIDER and API key not set **Solution:** ```bash set LLM_PROVIDER=anthropic set ANTHROPIC_API_KEY=sk-ant-xxxxx... python cli.py summarize <doc_id> ``` #### Issue: "LLM call failed: 401 Unauthorized" **Cause:** Invalid or expired API key **Solution:** - Check API key is correct: `echo %ANTHROPIC_API_KEY%` - Regenerate key at https://console.anthropic.com/account/keys (for Claude) - For OpenAI: https://platform.openai.com/account/api-keys #### Issue: "Document not found" **Cause:** Invalid document ID **Solution:** ```bash # List available documents python cli.py list # Use correct doc_id from output python cli.py summarize <correct-doc-id> ``` #### Issue: "Summary generation timed out" **Cause:** API unreachable or very slow **Solution:** - Check internet connection - Check LLM API status page - Try with simpler summary type (brief instead of detailed) ### Logging All operations logged to `server.log`: ```bash grep "Generating summary" server.log # Find summary generation attempts grep "Saved summary" server.log # Find successful saves grep "LLM call failed" server.log # Find API errors tail -f server.log # Live monitoring ``` --- ## Advanced Usage ### Regenerating Summaries Force regenerate specific document: ```bash python cli.py summarize <doc_id> --force ``` Force regenerate all documents: ```bash python cli.py summarize-all --force --types brief detailed ``` ### Batch Processing with Scripts ```batch @echo off REM Generate summaries for all C64 technical documents set LLM_PROVIDER=anthropic set ANTHROPIC_API_KEY=sk-ant-xxxxx... set LLM_MODEL=claude-3-haiku-20240307 echo Generating summaries for all documents... .venv\Scripts\python.exe cli.py summarize-all --types brief detailed echo Done! Check server.log for details. pause ``` ### Integration with Other Tools **Export summaries to CSV:** ```python import csv from server import KnowledgeBase import os kb = KnowledgeBase(os.path.expanduser('~/.tdz-c64-knowledge')) results = kb.generate_summary_all(summary_types=['brief']) with open('summaries.csv', 'w', newline='', encoding='utf-8') as f: writer = csv.writer(f) writer.writerow(['Document', 'Summary']) for doc_result in results['results']: if 'brief' in doc_result['summaries']: summary_info = doc_result['summaries']['brief'] if summary_info['success']: # Retrieve full summary summary = kb.get_summary(doc_result['doc_id'], 'brief') writer.writerow([doc_result['title'], summary]) ``` --- ## Future Enhancements ### Planned for v2.14.0+ 1. **Multiple Summary Languages** - Generate summaries in Spanish, French, German, etc. - `--language es` or `--language fr` 2. **Custom Summary Lengths** - User-defined word counts - `--max-words 500` parameter 3. **Streaming Summaries** - Real-time generation with progress updates - For large bulk operations 4. **Summary Analytics** - Average summary length by document type - Common themes across summaries - Topic extraction from summaries 5. **Cached Summary Browsing** - GUI display of all cached summaries - Quick navigation and search 6. **Auto-Update Detection** - Detect when documents change - Automatically regenerate affected summaries --- ## Database Queries ### View all cached summaries ```sql SELECT doc_id, summary_type, LENGTH(summary_text) as length, generated_at, model FROM document_summaries ORDER BY generated_at DESC LIMIT 10; ``` ### Find documents without summaries ```sql SELECT d.doc_id, d.title FROM documents d LEFT JOIN document_summaries s ON d.doc_id = s.doc_id WHERE s.doc_id IS NULL LIMIT 20; ``` ### Count summaries by type ```sql SELECT summary_type, COUNT(*) as count FROM document_summaries GROUP BY summary_type; ``` ### Delete old summaries (older than 30 days) ```sql DELETE FROM document_summaries WHERE generated_at < datetime('now', '-30 days'); ``` --- ## Troubleshooting ### Debug Mode Enable detailed logging: ```bash set DEBUG=1 python cli.py summarize <doc_id> --type brief ``` Check `server.log` for detailed output. ### Verify LLM Configuration ```bash python << 'EOF' from llm_integration import get_llm_client client = get_llm_client() if client: print("LLM configured correctly!") print(f"Provider: {client.__class__.__name__}") # Test with simple prompt response = client.call("Write one sentence.", max_tokens=50) print(f"Test response: {response}") else: print("ERROR: LLM not configured. Check LLM_PROVIDER and API key.") EOF ``` ### Verify Database ```bash python << 'EOF' import sqlite3 import os db_path = os.path.expanduser('~/.tdz-c64-knowledge/knowledge_base.db') conn = sqlite3.connect(db_path) cursor = conn.cursor() # Check table exists cursor.execute("SELECT name FROM sqlite_master WHERE type='table' AND name='document_summaries'") if cursor.fetchone(): print("document_summaries table exists: OK") else: print("ERROR: document_summaries table not found!") # Check row count cursor.execute("SELECT COUNT(*) FROM document_summaries") count = cursor.fetchone()[0] print(f"Cached summaries: {count}") conn.close() EOF ``` --- ## Examples ### Example 1: Quick Summary of One Document ```bash REM Get document ID python cli.py list | find "C64" REM Generate brief summary python cli.py summarize "c64-programmers-reference-v2-1985" --type brief ``` ### Example 2: Generate All Summary Types ```bash REM Generate all three summary types for one document python cli.py summarize "c64-programmers-reference-v2-1985" --type brief python cli.py summarize "c64-programmers-reference-v2-1985" --type detailed python cli.py summarize "c64-programmers-reference-v2-1985" --type bullet ``` ### Example 3: Bulk Summarization with Progress ```bash REM Generate brief and detailed summaries for all documents REM This will take 10-15 minutes for 148 documents python cli.py summarize-all --types brief detailed REM Check results python cli.py list REM Now documents will have cached summaries ``` ### Example 4: Using with Claude Desktop 1. Configure MCP server (see ENVIRONMENT_SETUP.md) 2. Restart Claude Desktop 3. Ask Claude: > "Summarize the document with ID 'c64-assembly-guide' with a detailed summary" 4. Claude will use the summarize_document tool --- ## Support & Feedback - **Issues:** Check `server.log` and this guide - **Feature Requests:** See FUTURE_IMPROVEMENTS_2025.md - **Code Changes:** Follow patterns in CLAUDE.md --- **Summary Statistics:** - **Code Added:** ~600 lines (3 methods + 3 MCP tools + 2 CLI commands) - **Database Schema:** 1 new table + 2 indexes + cascade deletes - **Migration:** Automatic for existing databases - **Backward Compatible:** Yes - existing databases automatically upgraded - **Performance Impact:** Minimal (lazy loading, cached retrieval) --- **Version:** 2.13.0 **Release Date:** 2025-12-17 **Status:** Production Ready ✓

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/MichaelTroelsen/tdz-c64-knowledge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

SUMMARIZATION.md•14.4 KiB