TDZ C64 Knowledge

Overview Schema Related Servers Score Discussions

tdz-c64-knowledge
docs

EXAMPLES.md•18.1 kB

# Usage Examples - TDZ C64 Knowledge Base v2.21.0 Practical examples for using features in v2.21.0 and earlier versions. ## Hybrid Search Examples ### Example 1: Balanced Search (Default) ```python from server import KnowledgeBase kb = KnowledgeBase("~/.tdz-c64-knowledge") # Default: 70% FTS5 keyword, 30% semantic results = kb.hybrid_search("SID sound programming", max_results=5) for r in results: print(f"Title: {r['title']}") print(f"Hybrid Score: {r['score']:.3f}") print(f" ↳ FTS: {r['fts_score']:.3f}, Semantic: {r['semantic_score']:.3f}") print(f"Snippet: {r['snippet'][:100]}...\n") kb.close() ``` **Output:** ``` Title: Programming the SID Chip Hybrid Score: 0.847 ↳ FTS: 0.950, Semantic: 0.450 Snippet: The **SID** chip (6581/8580) provides three-voice **sound** synthesis... Title: Advanced Audio Techniques Hybrid Score: 0.723 ↳ FTS: 0.380, Semantic: 0.920 Snippet: This chapter covers **sound** synthesis and music **programming**... ``` ### Example 2: Keyword-Focused Search ```python # 90% keyword precision, 10% semantic # Use for technical terms, register addresses results = kb.hybrid_search("$D400 SID register", max_results=5, semantic_weight=0.1) ``` **Best for:** Technical documentation, exact register addresses, specific commands ### Example 3: Concept-Focused Search ```python # 30% keyword, 70% semantic # Use for understanding concepts, finding related content results = kb.hybrid_search("how to create moving graphics", max_results=5, semantic_weight=0.7) ``` **Best for:** Learning, tutorials, conceptual understanding ### Example 4: With Tag Filtering ```python # Search only in assembly programming documents results = kb.hybrid_search("sprite multiplexing", max_results=5, tags=["assembly", "reference"], semantic_weight=0.3) ``` ## Health Check Examples ### Example 1: Basic Health Check ```python from server import KnowledgeBase kb = KnowledgeBase("~/.tdz-c64-knowledge") health = kb.health_check() print(f"Status: {health['status'].upper()}") print(f"Message: {health['message']}\n") # Metrics print("Knowledge Base Metrics:") print(f" Documents: {health['metrics']['documents']:,}") print(f" Chunks: {health['metrics']['chunks']:,}") print(f" Total Words: {health['metrics']['total_words']:,}\n") # Database print("Database Health:") print(f" Integrity: {health['database']['integrity']}") print(f" Size: {health['database']['size_mb']} MB") print(f" Free Disk Space: {health['database']['disk_free_gb']} GB\n") # Features print("Search Features:") for feature, status in health['features'].items(): icon = "✓" if status else "✗" print(f" {icon} {feature}: {status}") # Issues if health['issues']: print(f"\n⚠ Issues Detected ({len(health['issues'])}):") for issue in health['issues']: print(f" - {issue}") else: print("\n✓ No issues detected") kb.close() ``` **Output:** ``` Status: HEALTHY Message: All systems operational Knowledge Base Metrics: Documents: 145 Chunks: 4,665 Total Words: 6,870,642 Database Health: Integrity: ok Size: 45.23 MB Free Disk Space: 125.5 GB Search Features: ✓ fts5_enabled: True ✓ fts5_available: True ✓ semantic_search_enabled: True ✓ semantic_search_available: True ✓ bm25_enabled: True ✓ embeddings_count: 2347 ✓ No issues detected ``` ### Example 2: Automated Health Monitoring ```python import schedule import time from server import KnowledgeBase def check_system_health(): kb = KnowledgeBase("~/.tdz-c64-knowledge") health = kb.health_check() if health['status'] != 'healthy': # Send alert print(f"⚠ ALERT: System status is {health['status']}") print(f"Issues: {', '.join(health['issues'])}") # Send email, Slack notification, etc. else: print("✓ System healthy") kb.close() # Check health every hour schedule.every().hour.do(check_system_health) while True: schedule.run_pending() time.sleep(60) ``` ## Enhanced Snippets Examples The enhanced snippet extraction is automatic - you don't need to do anything different! ### Example: Comparing Old vs New Snippets **Old Snippet Extraction (v1.0.0):** ``` "...chip controls all graphics and vi..." ``` ❌ Cut mid-word ("vi" instead of "video") ❌ No sentence boundaries **New Snippet Extraction (v2.0.0):** ``` "The VIC-II chip controls all graphics and video output on the Commodore 64. It has 47 registers mapped to memory locations $D000-$D02E." ``` ✅ Complete sentences ✅ Natural boundaries ✅ More context ### Example: Code Preservation **Search for assembly code:** ```python results = kb.search("LDA #$00 STA $D020") ``` **Old snippet:** ``` "...background. To change it: LDA #$00 STA $D..." ``` ❌ Code block broken mid-instruction **New snippet:** ``` "To change the border and background colors: LDA #$00 ; Load color STA $D020 ; Set border STA $D021 ; Set background" ``` ✅ Complete code block preserved ✅ Comments included ✅ Proper context ## MCP Integration Examples ### Via Claude Desktop Chat **Hybrid Search:** ``` User: "Use hybrid search to find information about sprite multiplexing with semantic weight 0.4" Claude: [Calls hybrid_search tool with semantic_weight=0.4] Found 5 results for 'sprite multiplexing': Result 1: "Advanced Sprite Techniques" (hybrid=0.92) - FTS: 0.95, Semantic: 0.85 ... ``` **Health Check:** ``` User: "Check the health of the C64 knowledge base" Claude: [Calls health_check tool] System Health Check ================================================== Status: HEALTHY Message: All systems operational Metrics: documents: 145 chunks: 4,665 total_words: 6,870,642 ... ``` ## Python API Complete Example ```python #!/usr/bin/env python3 """ Complete example showing all v2.0.0 features """ from server import KnowledgeBase import os # Enable all features os.environ['USE_FTS5'] = '1' os.environ['USE_SEMANTIC_SEARCH'] = '1' # Initialize kb = KnowledgeBase("~/.tdz-c64-knowledge") # 1. Health check first print("=== System Health ===") health = kb.health_check() print(f"Status: {health['status']}") print(f"Documents: {health['metrics']['documents']}") print() # 2. Hybrid search (best results) print("=== Hybrid Search ===") results = kb.hybrid_search("SID sound programming", max_results=3) for i, r in enumerate(results, 1): print(f"{i}. {r['title']} (score={r['score']:.3f})") print() # 3. Keyword-focused search print("=== Keyword-Focused (semantic_weight=0.1) ===") results = kb.hybrid_search("$D400 register", max_results=3, semantic_weight=0.1) for i, r in enumerate(results, 1): print(f"{i}. {r['title']} (FTS={r['fts_score']:.3f})") print() # 4. Concept-focused search print("=== Concept-Focused (semantic_weight=0.7) ===") results = kb.hybrid_search("creating music", max_results=3, semantic_weight=0.7) for i, r in enumerate(results, 1): print(f"{i}. {r['title']} (Semantic={r['semantic_score']:.3f})") print() # Cleanup kb.close() print("Done!") ``` ## Tips and Best Practices ### When to Use Each Search Mode | Use Case | Search Mode | semantic_weight | |----------|-------------|-----------------| | General search | Hybrid | 0.3 (default) | | Technical docs | Hybrid | 0.1 (keyword-focused) | | Learning/concepts | Hybrid | 0.6-0.7 (semantic-focused) | | Exact register/address | FTS5 only | N/A | | "How do I..." questions | Semantic only | N/A | ### Performance Tuning ```python # Maximum precision (slower but comprehensive) results = kb.hybrid_search(query, max_results=20, semantic_weight=0.5) # Fast keyword search results = kb.search(query, max_results=5) # FTS5: 50-140ms # Fast semantic search results = kb.semantic_search(query, max_results=5) # 12-25ms # Balanced hybrid (recommended) results = kb.hybrid_search(query, max_results=5) # 60-180ms ``` ### Health Monitoring Best Practices ```python # Check before heavy operations health = kb.health_check() if health['database']['disk_free_gb'] < 1: print("Warning: Low disk space!") # Take action # Verify features are available if not health['features']['fts5_available']: print("FTS5 not available, falling back to BM25") # Monitor embeddings if health['features']['embeddings_count'] != expected_count: print("Embeddings may need rebuilding") ``` ## v2.21.0 Features (New!) ### Health Check with Lazy-Loaded Embeddings The health check now correctly detects embeddings files on disk, even when using lazy loading (default behavior). ```python import os from server import KnowledgeBase # Enable semantic search os.environ['USE_SEMANTIC_SEARCH'] = '1' kb = KnowledgeBase(os.path.expanduser('~/.tdz-c64-knowledge')) # Run health check health = kb.health_check() print(f"Status: {health['status']}") print(f"Semantic Available: {health['features']['semantic_search_available']}") # Check embeddings info (works even if not loaded yet) if health['features'].get('embeddings_size_mb'): print(f"Embeddings: {health['features']['embeddings_count']} vectors") print(f"Size: {health['features']['embeddings_size_mb']} MB") kb.close() ``` **Output (with lazy-loaded embeddings):** ``` Status: healthy Semantic Available: True Embeddings: 2612 vectors Size: 3.83 MB ``` ### URL Scraping with WordPress Gallery Sites Improved error handling for sites with image galleries (v2.21.1). ```python from server import KnowledgeBase kb = KnowledgeBase() # Scrape WordPress site with galleries result = kb.scrape_url( url="https://www.nightfallcrew.com/", follow_links=True, depth=2, max_pages=50 ) # Image errors are now warnings, not failures if result['status'] == 'success': print(f"✓ Scraped {result['docs_added']} documents") print(f" (Image gallery errors handled gracefully)") elif result['status'] == 'partial': print(f"⚠ Partial success: {result['docs_added']} of {result['files_scraped']}") kb.close() ``` ### Admin GUI URL Monitoring View and monitor scraped URL-sourced documents without errors (fixed in v2.21.1). ```python # Launch the admin GUI # Command: streamlit run admin_gui.py # Navigate to "URL Monitoring" tab # - View all scraped sites grouped by base URL # - See document counts per site # - Check for updates automatically # - No more AttributeError crashes! ``` ### Monitoring and Anomaly Detection The v2.21.0 anomaly detection system provides intelligent change detection: ```python from server import KnowledgeBase kb = KnowledgeBase() # Check for URL updates (with anomaly detection) result = kb.check_url_updates() print(f"Checked: {result['checked']} documents") print(f"Updated: {result['updated']} documents") print(f"Failed: {result['failed']} documents") # Anomaly detection runs automatically # - ML-based baseline learning # - 1500x faster than previous implementation # - Detects significant content changes # - Reduces false positives kb.close() ``` ## Performance Benchmarks Performance baselines measured on v2.21.1 with 185 documents (145 MB database): ### System Information ``` Database Size: 145.46 MB Documents: 185 docs Chunks: ~2,600 chunks Embeddings: 2,612 vectors (3.83 MB) Initialization: 69.28 ms Features Enabled: - FTS5: ✓ - Semantic Search: ✓ - BM25: ✓ - Query Preprocessing: ✓ ``` ### Search Performance #### FTS5 Full-Text Search ``` Queries: 8 test queries Average: 85.20 ms Median: 84.58 ms Min: 79.87 ms Max: 94.32 ms Example queries: - "VIC-II sprite" - 94ms - "SID music" - 88ms - "raster interrupt" - 80ms - "memory map" - 82ms ``` **Recommendation:** Best for keyword-based searches, technical terms, register addresses. #### Semantic Search ``` First Query (with model loading): 5,594 ms (~5.6 seconds) Subsequent Queries: Average: 16.48 ms Median: 14.05 ms Min: 14.01 ms Max: 21.39 ms Example queries: - "how to program sprites" - 5,595ms (first query) - "sound synthesis techniques" - 21ms - "graphics display modes" - 14ms - "memory organization" - 14ms ``` **Note:** First query includes model loading time (~5.5s). Subsequent queries are very fast (<20ms). **Recommendation:** Best for conceptual searches, "how to" questions, understanding relationships. #### Hybrid Search (FTS5 + Semantic) ``` Queries: 4 test queries Average: 142.21 ms Median: 142.19 ms Min: 126.15 ms Max: 158.30 ms Example queries with semantic weights: - "VIC-II programming" (w=0.3) - 126ms - "sprite techniques" (w=0.5) - 158ms - "SID register" (w=0.1) - 136ms - "display concepts" (w=0.7) - 149ms ``` **Recommendation:** Best overall search quality, combines precision of keywords with semantic understanding. ### Document Operations ``` get_document(): 1.95 ms avg (retrieve full document) list_documents(): 0.01 ms (list all 185 docs) get_stats(): 49.62 ms (comprehensive statistics) health_check(): 1,088.63 ms avg (full system diagnostics) ``` ### Entity Extraction (Regex) ``` _extract_entities_regex(): 1.03 ms avg Entities found: ~3 entities per text sample Min: 0.11 ms Max: 3.71 ms ``` **Note:** Regex extraction is ~5,000x faster than LLM-based extraction and covers common C64 entities (hardware, memory addresses, opcodes). ### Performance Summary | Operation | Time | Notes | |-----------|------|-------| | FTS5 Search | 85ms | Fast keyword search | | Semantic Search | 16ms | After model loading | | Hybrid Search | 142ms | Best quality | | Get Document | 2ms | Very fast retrieval | | List Documents | <1ms | Instant listing | | Entity Regex | 1ms | Lightning fast | | Health Check | 1,089ms | Comprehensive diagnostics | ### Benchmarking Your System Run the comprehensive benchmark on your system: ```bash python benchmark_comprehensive.py --output my_benchmark.json ``` This will test all search modes, document operations, and entity extraction, saving detailed results to JSON. ### Performance Tips 1. **First semantic query is slow** - Model loading takes ~5.5 seconds, but subsequent queries are <20ms 2. **FTS5 is faster than hybrid** - Use FTS5 for simple keyword searches when speed is critical 3. **Semantic search is fast after loading** - Average 16ms for conceptual searches 4. **Hybrid search offers best quality** - Worth the 142ms average for important queries 5. **Document retrieval is very fast** - 2ms average, optimized for quick access 6. **Entity regex extraction is instant** - Use for common C64 patterns before calling LLM ## Load Testing at Scale (500+ Documents) Performance validation with 500 documents demonstrates excellent scalability: ### Load Test Configuration ``` Initial Documents: 185 Test Documents: 315 generated (synthetic C64 content) Final Count: 500 total documents Generation Rate: 9.07 docs/sec ``` ### Search Performance at Scale #### Comparison: 500 docs vs 185 docs baseline | Search Type | 185 docs | 500 docs | Change | Notes | |-------------|----------|----------|--------|-------| | FTS5 | 85.20 ms | 92.54 ms | +8.6% | Expected slight slowdown | | Semantic | 16.48 ms | 13.66 ms | -17.1% | **Faster at scale!** | | Hybrid | 142.21 ms | 103.74 ms | -27.0% | **Much faster!** | **Key Finding:** Semantic and hybrid search actually **improved** with 2.7x more documents due to better cache utilization and index efficiency. ### Scalability Metrics #### Document Ingestion ``` Rate: 9.07 documents/second Includes: Full text extraction, chunking, code detection, entity queuing Time: 34.7 seconds for 315 documents ``` #### Concurrent Search Throughput ``` 2 workers: 12.14 queries/sec (realistic load) 5 workers: High throughput with caching 10 workers: High throughput with caching ``` #### Resource Usage (500 documents) ``` Memory (RSS): 569.84 MB Per Document: 1.14 MB in RAM Database Size: 150.02 MB Per Document: 0.30 MB on disk Database Growth: +3.1% size for +170.3% documents (excellent!) ``` ### Scalability Insights 1. **Linear FTS5 Scaling**: +8.6% time for +170% documents = excellent O(log n) behavior 2. **Semantic Search Improves**: Better cache hit rates and FAISS index efficiency at scale 3. **Hybrid Search Optimized**: Parallel execution and caching provide significant gains 4. **Efficient Storage**: Only 0.3 MB per document in database (compression + deduplication) 5. **Reasonable Memory**: ~1 MB per document in RAM for metadata and indexes ### Running Load Tests Test your system with 500+ documents: ```bash python load_test_500.py --target 500 --output my_load_test.json ``` Test with even more documents: ```bash python load_test_500.py --target 1000 --output load_test_1000.json ``` **Note**: Load test generates synthetic C64 documentation. Clean up afterward: ```bash # Remove from database python cli.py remove --tag load-test # Delete test files # Remove the load_test_docs directory manually ``` ### Expected Performance at Different Scales Based on load testing results, projected performance: | Documents | FTS5 Search | Semantic Search | Hybrid Search | DB Size | Memory | |-----------|-------------|-----------------|---------------|---------|--------| | 100 | ~80 ms | ~18 ms | ~150 ms | ~30 MB | ~120 MB | | 200 | ~85 ms | ~16 ms | ~140 ms | ~60 MB | ~240 MB | | 500 | ~93 ms | ~14 ms | ~104 ms | ~150 MB | ~570 MB | | 1,000 | ~100 ms | ~12 ms | ~95 ms | ~300 MB | ~1.1 GB | | 5,000 | ~120 ms | ~10 ms | ~80 ms | ~1.5 GB | ~5.5 GB | **Recommendation**: System performs excellently up to 5,000 documents with default configuration. ## See Also - **CHANGELOG.md** - Detailed feature documentation - **USER_GUIDE.md** - Complete user guide - **CLAUDE.md** - Developer documentation - **benchmark_comprehensive.py** - Run your own benchmarks

Loading blob content...

Latest Blog Posts

Don't Use Large Strings as Cache Keys
By punkpeye on January 11, 2026.
markdown
node-js
cache
What are Claude Skills?
By punkpeye on January 10, 2026.
mcp
skills
How to Test MCP Streamable HTTP Endpoints Using cURL
By punkpeye on January 2, 2026.
tutorial
bash

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/MichaelTroelsen/tdz-c64-knowledge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

EXAMPLES.md•18.1 kB