Skip to main content
Glama

TDZ C64 Knowledge

Version CI/CD Pipeline Python 3.10+ License: MIT Code style: ruff

MCP server for managing and searching Commodore 64 documentation. Ingest PDFs, text, Markdown, HTML, Excel, and web pages into a searchable knowledge base accessible via Claude Code or other MCP clients.

🚀 Quick Start

# 1. Install
python -m venv .venv
.venv\Scripts\activate
pip install -e .

# 2. Configure Claude Code
claude mcp add tdz-c64-knowledge -- .venv\Scripts\python.exe server.py

# 3. Add documents
.venv\Scripts\python.exe cli.py add-folder "C:\c64docs" --tags reference --recursive

# 4. Search via Claude Code
# Ask: "Search the C64 docs for VIC-II sprite registers"

See QUICKSTART.md for detailed setup.

Features

Search & Retrieval

  • FTS5 full-text search - 480x faster queries (50ms vs 24s)

  • Semantic search - Find by meaning, not keywords (e.g., "movable objects" → "sprites")

  • RAG question answering - Answer questions by synthesizing docs with citations

  • Fuzzy search - Typo tolerance ("VIC2" → "VIC-II", "asembly" → "assembly")

  • Progressive refinement - Search within results to narrow down

  • Hybrid search - Combines keyword + semantic with configurable weighting

  • Similarity search - Discover related documentation automatically

  • Query preprocessing - NLTK stemming and stopword removal

  • Smart tagging - AI-powered tag suggestions by category

  • Table/code search - Search extracted tables and code blocks

Document Management

  • Multi-format - PDF, TXT, MD, HTML, Excel, web scraping

  • Duplicate detection - Content-based deduplication

  • Chunked retrieval - Get specific sections without loading entire docs

  • Metadata extraction - Author, subject, page numbers

  • Persistent index - Documents stay indexed between sessions

AI-Powered Features

  • Entity extraction - Extract hardware, memory addresses, instructions, concepts (5000x faster with C64 regex patterns)

  • Relationship mapping - Co-occurrence analysis with distance-based strength scoring

  • Document comparison - Side-by-side analysis with similarity scores

  • Natural language query translation - Parse queries into structured search parameters

  • Anomaly detection - ML-based baseline learning for URL-sourced content (3400+ docs/second)

  • Temporal analysis - Event detection, timeline construction, historical context (5 event types, 8 date formats)

  • Advanced visualizations - 3D knowledge graphs, hierarchical bundling, Sankey flow diagrams

Wiki Export (NEW in v2.23.15)

  • Static HTML wiki - Export entire knowledge base to browsable website

  • Document similarity map - 2D visualization using UMAP/t-SNE dimensionality reduction

  • Interactive timeline - Horizontal scrollable timeline with zoom levels and event filters

  • Knowledge graph - D3.js force-directed graph (178 entities, 20 relationships)

  • Enhanced UI - Explanation boxes, prominent ASK AI button, file type detection

  • Clickable clusters - Browse k-means clusters with linked documents

  • No server required - Pure client-side JavaScript, works offline

  • Full-text search - Fuse.js powered search across all content

  • See WIKI_EXPORT_GUIDE.md for usage

REST API (Optional)

  • 27 endpoints - Full CRUD, search, analytics, export

  • OpenAPI/Swagger docs - Interactive API at /api/docs

  • API authentication - Secure via X-API-Key header

  • See docs/REST_API.md for details

Performance

  • Scalability - Tested to 5,000+ documents

  • Concurrent throughput - 5,712 queries/sec (10 workers)

  • Lazy loading - 100k+ document support

  • Search caching - 50-100x speedup for repeated queries

Installation (Windows)

Prerequisites

  • Python 3.10+ - https://python.org (check "Add Python to PATH")

  • uv (recommended) or pip: pip install uv

Setup

cd C:\Users\YourName\mcp-servers\tdz-c64-knowledge

# Using uv (faster)
uv venv
.venv\Scripts\activate
uv pip install mcp pypdf rank-bm25 nltk

# Or using pip
python -m venv .venv
.venv\Scripts\activate
pip install mcp pypdf rank-bm25 nltk

# Test
python server.py  # Press Ctrl+C to stop

Configuration

Claude Code

claude mcp add tdz-c64-knowledge -- C:\path\.venv\Scripts\python.exe C:\path\server.py

Or add to .claude/settings.json:

{
  "mcpServers": {
    "tdz-c64-knowledge": {
      "command": "C:\\path\\.venv\\Scripts\\python.exe",
      "args": ["C:\\path\\server.py"],
      "env": {
        "TDZ_DATA_DIR": "C:\\c64-knowledge-data"
      }
    }
  }
}

Claude Desktop

Add to %APPDATA%\Claude\claude_desktop_config.json:

{
  "mcpServers": {
    "tdz-c64-knowledge": {
      "command": "C:\\path\\.venv\\Scripts\\python.exe",
      "args": ["C:\\path\\server.py"],
      "env": {
        "TDZ_DATA_DIR": "C:\\c64-knowledge-data"
      }
    }
  }
}

Environment Variables

Variable

Description

Default

TDZ_DATA_DIR

Database directory

~/.tdz-c64-knowledge

USE_FTS5

Enable FTS5 search (recommended)

0

USE_SEMANTIC_SEARCH

Enable semantic search

0

SEMANTIC_MODEL

Sentence-transformers model

all-MiniLM-L6-v2

USE_BM25

Enable BM25 fallback

1

USE_QUERY_PREPROCESSING

Enable NLTK preprocessing

1

USE_FUZZY_SEARCH

Enable fuzzy search

1

FUZZY_THRESHOLD

Fuzzy similarity (0-100)

80

USE_OCR

Enable OCR for scanned PDFs

1

SEARCH_CACHE_SIZE

Max cached results

100

SEARCH_CACHE_TTL

Cache TTL (seconds)

300

ALLOWED_DOCS_DIRS

Document directory whitelist

None

Search Features

Enable with USE_FTS5=1 for maximum performance:

  • 480x faster than BM25

  • Native SQLite BM25 ranking

  • Porter stemming tokenizer

Enable with USE_SEMANTIC_SEARCH=1:

  • Meaning-based search (e.g., "movable objects" finds "sprites")

  • FAISS vector similarity with sentence-transformers

  • ~7-16ms per query after embeddings built

  • Pre-build embeddings: pip install sentence-transformers faiss-cpu

Use double quotes for exact phrases:

search_docs(query='"VIC-II chip" registers')

Handles typos automatically with USE_FUZZY_SEARCH=1:

  • "VIC-I" → "VIC-II" (83% similarity)

  • "grafics" → "graphics" (88% similarity)

  • Configurable threshold (default: 80%)

OCR for Scanned PDFs

Automatic with USE_OCR=1:

  • Detects scanned PDFs (< 100 chars extracted)

  • Uses Tesseract OCR

  • Install: pip install pytesseract pdf2image Pillow + Tesseract binary

  • ~1-2 seconds per page

Temporal Analysis & Visualizations

Extract events, construct timelines, and visualize knowledge graphs.

Event Detection

Automatically detect significant events in documents:

  • 5 Event Types - Product releases, company milestones, technical innovations, cultural events, version updates

  • 8 Date Formats - Full dates, month-year, year ranges, decades, parenthetical dates

  • Confidence Scoring - Pattern matching with proximity-based confidence (0.0-1.0)

  • Entity Association - Automatically link entities to events

# Extract events from a document
result = kb.extract_document_events('doc_id', min_confidence=0.7)
# Returns: event_count, filtered_count, stored_count, events list

Timeline Construction

Build chronological timelines with flexible querying:

  • Automatic Timeline Building - Chronologically sorted by date (YYYYMMDD integer sort)

  • Category Organization - Group by decade-type combinations (e.g., "1980s-release")

  • Importance Levels - 1-5 scale based on confidence

  • Date Range Filtering - Query events by year range, type, importance

# Build timeline from events
timeline_result = kb.build_timeline(min_confidence=0.5)

# Query timeline
timeline = kb.get_timeline(start_year=1980, end_year=1989, min_importance=3)

# Get historical context
context = kb.get_historical_context(year=1982, context_years=2)

Interactive Visualizations

Generate interactive HTML visualizations with Plotly and NetworkX:

Timeline Visualizations:

  • Interactive Timeline - Horizontal timeline with zoom/pan, color-coded by event type

  • Event Network - Spring layout showing event relationships

  • Trend Charts - Multi-subplot dashboard (bar chart, stacked area, cumulative line)

Advanced Graph Visualizations:

  • 3D Knowledge Graph - Interactive 3D entity-relationship graph with rotation controls

  • Hierarchical Bundling - Circular layout with curved edges bundled through center

  • Sankey Diagrams - Topic flow over time (decade or year grouping)

# Generate visualizations
kb.visualize_timeline(start_year=1980, end_year=1990, output_path="timeline.html")
kb.visualize_knowledge_graph_3d(max_entities=50, output_path="graph_3d.html")
kb.visualize_hierarchical_bundling(max_entities=30, output_path="bundling.html")
kb.visualize_topic_flow_sankey(time_period='decade', output_path="flow.html")

MCP Tools for Timeline

4 timeline-specific MCP tools:

  • extract_document_events - Extract and store events from documents

  • get_timeline - Query chronological timeline with filters

  • search_events_by_date - Search events by date range and type

  • get_historical_context - Get events around a specific year

See PHASE3_TEMPORAL_ANALYSIS.md for complete documentation.

Tools

62 MCP tools organized by category. Key tools listed below.

Search Tools

search_docs - Full-text search

search_docs(query="SID register", max_results=5, tags=["sid"])

semantic_search - Meaning-based search

semantic_search(query="How do sprites work?", max_results=5)

hybrid_search - Combined keyword + semantic

hybrid_search(query="SID chip", semantic_weight=0.7, max_results=10)

answer_question - RAG-based Q&A with citations

answer_question(
  question="How do I program sprites on the VIC-II?",
  max_sources=5,
  search_mode="auto"
)

fuzzy_search - Typo-tolerant search

fuzzy_search(query="VIC2 asembly", similarity_threshold=80)

search_within_results - Progressive refinement

# Broad search, then refine
results = search_docs(query="VIC-II", max_results=50)
refined = search_within_results(results, "sprite collision", max_results=5)

find_similar - Find related documents

find_similar(doc_id="abc123", max_results=5)

Document Management

add_document - Add a file

add_document(
  filepath="C:/docs/c64_ref.pdf",
  title="C64 Programmer's Reference",
  tags=["reference", "memory-map"]
)

add_documents_bulk - Bulk import

add_documents_bulk(
  directory="C:/c64docs",
  pattern="**/*.{pdf,txt}",
  tags=["reference"],
  recursive=true
)

list_docs - List all documents

get_chunk - Get specific chunk

get_chunk(doc_id="abc123", chunk_id=5)

remove_document - Remove a document

remove_documents_bulk - Bulk remove by IDs or tags

remove_documents_bulk(tags=["outdated"])

check_updates - Check for file changes

check_updates(auto_update=false)

URL Scraping

scrape_url - Scrape documentation website

scrape_url(
  url="https://www.c64-wiki.com/wiki/VIC",
  tags=["wiki"],
  depth=2,
  threads=5
)

rescrape_document - Re-scrape for updates

rescrape_document(doc_id="abc123", force=false)

check_url_updates - Check all scraped docs

check_url_updates(auto_rescrape=false, check_structure=true)

AI & Analytics

extract_entities - Extract named entities

extract_entities(doc_id="abc123", confidence_threshold=0.6)

search_entities - Search across entities

search_entities(query="VIC-II", entity_types=["hardware"])

get_entity_analytics - Comprehensive entity statistics

extract_entity_relationships - Extract co-occurrences

extract_entity_relationships(doc_id="abc123", min_strength=0.3)

search_entity_pair - Find docs with entity pair

search_entity_pair(entity1="VIC-II", entity2="sprite")

compare_documents - Side-by-side comparison

compare_documents(doc_id_1="abc", doc_id_2="def", comparison_type="full")

suggest_tags - AI-powered tag suggestions

suggest_tags(doc_id="abc123", confidence_threshold=0.6)

get_tags_by_category - Browse tags by category

translate_query - Parse natural language queries

translate_query(query="find sprites on VIC-II chip")

Export Tools

export_entities - Export to CSV/JSON

export_entities(format="csv", output_path="entities.csv", min_confidence=0.7)

export_relationships - Export relationships

export_relationships(format="json", output_path="rels.json", min_strength=0.5)

System

kb_stats - Knowledge base statistics

health_check - System diagnostics

Data Storage

SQLite database with 12+ tables:

  • documents - Document metadata

  • chunks - Chunked content (1500 words, 200 overlap)

  • document_tables - Extracted PDF tables

  • document_code_blocks - Detected code blocks

  • document_entities - Extracted entities

  • entity_relationships - Co-occurrence tracking

  • Plus: summaries, extraction_jobs, monitoring_history, etc.

Benefits:

  • Lazy loading (metadata at startup, chunks on-demand)

  • ACID transactions

  • Scalable to 100k+ documents

  • FTS5 full-text indexes

Default location: ~/.tdz-c64-knowledge or TDZ_DATA_DIR

Usage Examples

Ask Claude Code:

  • "Search the C64 docs for SID voice registers"

  • "What does the memory map say about $D400?"

  • "Find information about sprite multiplexing"

  • "Add C:/docs/mapping_the_c64.pdf with tags memory-map, reference"

  • "How do I program raster interrupts on the VIC-II?" (uses RAG)

Suggested Tags

Organize docs with consistent tags:

  • reference, memory-map, basic, assembly

  • sid, vic-ii, cia, kernal

  • hardware, disk, graphics, sound

Troubleshooting

"pypdf not installed" - Run: pip install pypdf rank-bm25

"mcp module not found" - Run: pip install mcp

Server not responding - Use Python from virtual environment, not system Python

PDF extraction issues - Use OCR or add plain text version

BM25 issues - Check logs in TDZ_DATA_DIR/server.log, try USE_BM25=0

Development

Testing

pip install -e ".[dev]"

# Run all tests
pytest test_server.py test_wiki_export.py -v

# With coverage
pytest test_server.py -v --cov=server --cov-report=term

# Wiki export tests only
pytest test_wiki_export.py -v

Test Coverage:

  • test_server.py - Core server functionality (search, entities, RAG, etc.)

  • test_wiki_export.py - Wiki generation features (16 tests):

    • Document coordinate export (UMAP/t-SNE)

    • File type detection (HTML/MD)

    • Cluster document export

    • HTML generation with explanation boxes

    • JavaScript generation for interactive features

CI/CD

GitHub Actions workflow tests on Python 3.10/3.11/3.12 across Windows/Linux/macOS with Ruff code quality checks.

Documentation

Core Documentation

Feature Documentation

Browse docs/ for detailed guides on specific features:

API & Integration:

  • REST API - FastAPI REST server (27 endpoints)

AI-Powered Features:

Data Sources:

Setup & Deployment:

User Interfaces:

Development:

Version History

v2.23.0 - RAG Question Answering & Advanced Search (Phase 2 Complete)

  • RAG-based answer_question with citations

  • Fuzzy search with rapidfuzz

  • Progressive search refinement

  • Smart tagging system

v2.22.0 - Search Improvements (Phase 1 Complete)

  • Enhanced entity analytics

  • C64-specific regex patterns (5000x faster)

  • Performance optimizations

v2.21.0 - Anomaly Detection

  • ML-based baseline learning

  • 1500x performance improvement

v2.18.0 - REST API & Background Processing

  • FastAPI REST server (27 endpoints)

  • Background entity extraction

v2.15.0+ - Entity Intelligence

  • Entity extraction, relationships, analytics

See CONTEXT.md for complete version history.

License

MIT License - Use freely for your retro computing projects!

-
security - not tested
F
license - not found
-
quality - not tested

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/MichaelTroelsen/tdz-c64-knowledge'

If you have feedback or need assistance with the MCP directory API, please join our Discord server