Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@TDZ C64 KnowledgeSearch for VIC-II sprite registers and their memory addresses"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
TDZ C64 Knowledge
MCP server for managing and searching Commodore 64 documentation. Ingest PDFs, text, Markdown, HTML, Excel, and web pages into a searchable knowledge base accessible via Claude Code or other MCP clients.
🚀 Quick Start
See QUICKSTART.md for detailed setup.
Features
Search & Retrieval
FTS5 full-text search - 480x faster queries (50ms vs 24s)
Semantic search - Find by meaning, not keywords (e.g., "movable objects" → "sprites")
RAG question answering - Answer questions by synthesizing docs with citations
Fuzzy search - Typo tolerance ("VIC2" → "VIC-II", "asembly" → "assembly")
Progressive refinement - Search within results to narrow down
Hybrid search - Combines keyword + semantic with configurable weighting
Similarity search - Discover related documentation automatically
Query preprocessing - NLTK stemming and stopword removal
Smart tagging - AI-powered tag suggestions by category
Table/code search - Search extracted tables and code blocks
Document Management
Multi-format - PDF, TXT, MD, HTML, Excel, web scraping
Duplicate detection - Content-based deduplication
Chunked retrieval - Get specific sections without loading entire docs
Metadata extraction - Author, subject, page numbers
Persistent index - Documents stay indexed between sessions
AI-Powered Features
Entity extraction - Extract hardware, memory addresses, instructions, concepts (5000x faster with C64 regex patterns)
Relationship mapping - Co-occurrence analysis with distance-based strength scoring
Document comparison - Side-by-side analysis with similarity scores
Natural language query translation - Parse queries into structured search parameters
Anomaly detection - ML-based baseline learning for URL-sourced content (3400+ docs/second)
Temporal analysis - Event detection, timeline construction, historical context (5 event types, 8 date formats)
Advanced visualizations - 3D knowledge graphs, hierarchical bundling, Sankey flow diagrams
Wiki Export (NEW in v2.23.15)
Static HTML wiki - Export entire knowledge base to browsable website
Document similarity map - 2D visualization using UMAP/t-SNE dimensionality reduction
Interactive timeline - Horizontal scrollable timeline with zoom levels and event filters
Knowledge graph - D3.js force-directed graph (178 entities, 20 relationships)
Enhanced UI - Explanation boxes, prominent ASK AI button, file type detection
Clickable clusters - Browse k-means clusters with linked documents
No server required - Pure client-side JavaScript, works offline
Full-text search - Fuse.js powered search across all content
See WIKI_EXPORT_GUIDE.md for usage
REST API (Optional)
27 endpoints - Full CRUD, search, analytics, export
OpenAPI/Swagger docs - Interactive API at
/api/docsAPI authentication - Secure via X-API-Key header
See docs/REST_API.md for details
Performance
Scalability - Tested to 5,000+ documents
Concurrent throughput - 5,712 queries/sec (10 workers)
Lazy loading - 100k+ document support
Search caching - 50-100x speedup for repeated queries
Installation (Windows)
Prerequisites
Python 3.10+ - https://python.org (check "Add Python to PATH")
uv (recommended) or pip:
pip install uv
Setup
Configuration
Claude Code
Or add to .claude/settings.json:
Claude Desktop
Add to %APPDATA%\Claude\claude_desktop_config.json:
Environment Variables
Variable | Description | Default |
| Database directory |
|
| Enable FTS5 search (recommended) |
|
| Enable semantic search |
|
| Sentence-transformers model |
|
| Enable BM25 fallback |
|
| Enable NLTK preprocessing |
|
| Enable fuzzy search |
|
| Fuzzy similarity (0-100) |
|
| Enable OCR for scanned PDFs |
|
| Max cached results |
|
| Cache TTL (seconds) |
|
| Document directory whitelist | None |
Search Features
FTS5 Full-Text Search (Recommended)
Enable with USE_FTS5=1 for maximum performance:
480x faster than BM25
Native SQLite BM25 ranking
Porter stemming tokenizer
Semantic Search
Enable with USE_SEMANTIC_SEARCH=1:
Meaning-based search (e.g., "movable objects" finds "sprites")
FAISS vector similarity with sentence-transformers
~7-16ms per query after embeddings built
Pre-build embeddings:
pip install sentence-transformers faiss-cpu
Phrase Search
Use double quotes for exact phrases:
Fuzzy Search
Handles typos automatically with USE_FUZZY_SEARCH=1:
"VIC-I" → "VIC-II" (83% similarity)
"grafics" → "graphics" (88% similarity)
Configurable threshold (default: 80%)
OCR for Scanned PDFs
Automatic with USE_OCR=1:
Detects scanned PDFs (< 100 chars extracted)
Uses Tesseract OCR
Install:
pip install pytesseract pdf2image Pillow+ Tesseract binary~1-2 seconds per page
Temporal Analysis & Visualizations
Extract events, construct timelines, and visualize knowledge graphs.
Event Detection
Automatically detect significant events in documents:
5 Event Types - Product releases, company milestones, technical innovations, cultural events, version updates
8 Date Formats - Full dates, month-year, year ranges, decades, parenthetical dates
Confidence Scoring - Pattern matching with proximity-based confidence (0.0-1.0)
Entity Association - Automatically link entities to events
Timeline Construction
Build chronological timelines with flexible querying:
Automatic Timeline Building - Chronologically sorted by date (YYYYMMDD integer sort)
Category Organization - Group by decade-type combinations (e.g., "1980s-release")
Importance Levels - 1-5 scale based on confidence
Date Range Filtering - Query events by year range, type, importance
Interactive Visualizations
Generate interactive HTML visualizations with Plotly and NetworkX:
Timeline Visualizations:
Interactive Timeline - Horizontal timeline with zoom/pan, color-coded by event type
Event Network - Spring layout showing event relationships
Trend Charts - Multi-subplot dashboard (bar chart, stacked area, cumulative line)
Advanced Graph Visualizations:
3D Knowledge Graph - Interactive 3D entity-relationship graph with rotation controls
Hierarchical Bundling - Circular layout with curved edges bundled through center
Sankey Diagrams - Topic flow over time (decade or year grouping)
MCP Tools for Timeline
4 timeline-specific MCP tools:
extract_document_events- Extract and store events from documentsget_timeline- Query chronological timeline with filterssearch_events_by_date- Search events by date range and typeget_historical_context- Get events around a specific year
See PHASE3_TEMPORAL_ANALYSIS.md for complete documentation.
Tools
62 MCP tools organized by category. Key tools listed below.
Search Tools
search_docs - Full-text search
semantic_search - Meaning-based search
hybrid_search - Combined keyword + semantic
answer_question - RAG-based Q&A with citations
fuzzy_search - Typo-tolerant search
search_within_results - Progressive refinement
find_similar - Find related documents
Document Management
add_document - Add a file
add_documents_bulk - Bulk import
list_docs - List all documents
get_chunk - Get specific chunk
remove_document - Remove a document
remove_documents_bulk - Bulk remove by IDs or tags
check_updates - Check for file changes
URL Scraping
scrape_url - Scrape documentation website
rescrape_document - Re-scrape for updates
check_url_updates - Check all scraped docs
AI & Analytics
extract_entities - Extract named entities
search_entities - Search across entities
get_entity_analytics - Comprehensive entity statistics
extract_entity_relationships - Extract co-occurrences
search_entity_pair - Find docs with entity pair
compare_documents - Side-by-side comparison
suggest_tags - AI-powered tag suggestions
get_tags_by_category - Browse tags by category
translate_query - Parse natural language queries
Export Tools
export_entities - Export to CSV/JSON
export_relationships - Export relationships
System
kb_stats - Knowledge base statistics
health_check - System diagnostics
Data Storage
SQLite database with 12+ tables:
documents - Document metadata
chunks - Chunked content (1500 words, 200 overlap)
document_tables - Extracted PDF tables
document_code_blocks - Detected code blocks
document_entities - Extracted entities
entity_relationships - Co-occurrence tracking
Plus: summaries, extraction_jobs, monitoring_history, etc.
Benefits:
Lazy loading (metadata at startup, chunks on-demand)
ACID transactions
Scalable to 100k+ documents
FTS5 full-text indexes
Default location: ~/.tdz-c64-knowledge or TDZ_DATA_DIR
Usage Examples
Ask Claude Code:
"Search the C64 docs for SID voice registers"
"What does the memory map say about $D400?"
"Find information about sprite multiplexing"
"Add C:/docs/mapping_the_c64.pdf with tags memory-map, reference"
"How do I program raster interrupts on the VIC-II?" (uses RAG)
Suggested Tags
Organize docs with consistent tags:
reference,memory-map,basic,assemblysid,vic-ii,cia,kernalhardware,disk,graphics,sound
Troubleshooting
"pypdf not installed" - Run: pip install pypdf rank-bm25
"mcp module not found" - Run: pip install mcp
Server not responding - Use Python from virtual environment, not system Python
PDF extraction issues - Use OCR or add plain text version
BM25 issues - Check logs in TDZ_DATA_DIR/server.log, try USE_BM25=0
Development
Testing
Test Coverage:
test_server.py- Core server functionality (search, entities, RAG, etc.)test_wiki_export.py- Wiki generation features (16 tests):Document coordinate export (UMAP/t-SNE)
File type detection (HTML/MD)
Cluster document export
HTML generation with explanation boxes
JavaScript generation for interactive features
CI/CD
GitHub Actions workflow tests on Python 3.10/3.11/3.12 across Windows/Linux/macOS with Ruff code quality checks.
Documentation
Core Documentation
README.md (this file) - Installation, features, tools, usage
QUICKSTART.md - Fast setup guide (5 minutes)
ARCHITECTURE.md - Technical deep dive, database schema, algorithms
CONTEXT.md - Project status, quick stats, version history
CLAUDE.md - Quick reference for Claude Code integration
CHANGELOG.md - Complete version history
Feature Documentation
Browse docs/ for detailed guides on specific features:
API & Integration:
REST API - FastAPI REST server (27 endpoints)
AI-Powered Features:
Entity Extraction - Extract hardware, memory addresses, instructions
Anomaly Detection - ML-based URL content monitoring
Summarization - AI-powered document summarization
Data Sources:
Web Scraping - Scrape documentation websites
Web Monitoring - Track URL-sourced content changes
Setup & Deployment:
Deployment Guide - Production deployment
Docker Setup - Docker configuration
Environment Setup - Environment variables
Poppler Setup - Poppler installation for PDFs
User Interfaces:
GUI Guide - Streamlit web interface
Development:
Testing Guide - Test suite and CI/CD
Examples - Usage examples and performance analysis
Monitoring Setup - Scheduled monitoring configuration
Roadmap - Future improvements and features
Version History
v2.23.0 - RAG Question Answering & Advanced Search (Phase 2 Complete)
RAG-based answer_question with citations
Fuzzy search with rapidfuzz
Progressive search refinement
Smart tagging system
v2.22.0 - Search Improvements (Phase 1 Complete)
Enhanced entity analytics
C64-specific regex patterns (5000x faster)
Performance optimizations
v2.21.0 - Anomaly Detection
ML-based baseline learning
1500x performance improvement
v2.18.0 - REST API & Background Processing
FastAPI REST server (27 endpoints)
Background entity extraction
v2.15.0+ - Entity Intelligence
Entity extraction, relationships, analytics
See CONTEXT.md for complete version history.
License
MIT License - Use freely for your retro computing projects!