Enables crowdsourced evidence collection for improving LLM limitation detection through taxonomy building and community-contributed examples of problematic AI behaviors
Uses Pydantic for data validation and structured analysis of LLM prompts and responses in the taxonomy system
Built as a Python-based MCP server that analyzes LLM interactions for safety issues and maintains a taxonomy of generative model limitations
ToGMAL MCP Server
Taxonomy of Generative Model Apparent Limitations
A Model Context Protocol (MCP) server that provides real-time, privacy-preserving analysis of LLM interactions to detect out-of-distribution behaviors and recommend safety interventions.
Overview
ToGMAL helps prevent common LLM pitfalls by detecting:
🔬 Math/Physics Speculation: Ungrounded "theories of everything" and invented physics
🏥 Medical Advice Issues: Health recommendations without proper sources or disclaimers
💾 Dangerous File Operations: Mass deletions, recursive operations without safeguards
💻 Vibe Coding Overreach: Overly ambitious projects without proper scoping
📊 Unsupported Claims: Strong assertions without evidence or hedging
Key Features
Privacy-Preserving: All analysis is deterministic and local (no external API calls)
Low Latency: Heuristic-based detection for real-time analysis
Intervention Recommendations: Suggests step breakdown, human-in-the-loop, or web search
Taxonomy Building: Crowdsourced evidence collection for improving detection
Extensible: Easy to add new detection patterns and categories
Installation
Prerequisites
Python 3.10 or higher
pip package manager
Install Dependencies
Install the Server
Usage
Available Tools
1. togmal_analyze_prompt
Analyze a user prompt before the LLM processes it.
Parameters:
prompt
(str): The user prompt to analyzeresponse_format
(str): Output format -"markdown"
or"json"
Example:
Use Cases:
Detect speculative physics theories before generating responses
Flag overly ambitious coding requests
Identify requests for medical advice that need disclaimers
2. togmal_analyze_response
Analyze an LLM response for potential issues.
Parameters:
response
(str): The LLM response to analyzecontext
(str, optional): Original prompt for better analysisresponse_format
(str): Output format -"json"
or"json"
Example:
Use Cases:
Check for ungrounded medical advice
Detect dangerous file operation instructions
Flag unsupported statistical claims
3. togmal_submit_evidence
Submit evidence of LLM limitations to improve the taxonomy.
Parameters:
category
(str): Type of limitation -"math_physics_speculation"
,"ungrounded_medical_advice"
, etc.prompt
(str): The prompt that triggered the issueresponse
(str): The problematic responsedescription
(str): Why this is problematicseverity
(str): Severity level -"low"
,"moderate"
,"high"
, or"critical"
Example:
Features:
Human-in-the-loop confirmation before submission
Generates unique entry ID for tracking
Contributes to improving detection heuristics
4. togmal_get_taxonomy
Retrieve entries from the taxonomy database.
Parameters:
category
(str, optional): Filter by categorymin_severity
(str, optional): Minimum severity to includelimit
(int): Maximum entries to return (1-100, default 20)offset
(int): Pagination offset (default 0)response_format
(str): Output format
Example:
Use Cases:
Research common LLM failure patterns
Train improved detection models
Generate safety guidelines
5. togmal_get_statistics
Get statistical overview of the taxonomy database.
Parameters:
response_format
(str): Output format
Returns:
Total entries by category
Severity distribution
Database capacity status
Detection Heuristics
Math/Physics Speculation
Detects:
"Theory of everything" claims
Unified field theory proposals
Invented equations or particles
Modifications to fundamental constants
Patterns:
Ungrounded Medical Advice
Detects:
Diagnoses without qualifications
Treatment recommendations without sources
Specific drug dosages
Dismissive responses to symptoms
Patterns:
Dangerous File Operations
Detects:
Mass deletion commands
Recursive operations without safeguards
Operations on test files without confirmation
No human-in-the-loop for destructive actions
Patterns:
Vibe Coding Overreach
Detects:
Requests for complete applications
Massive line count targets (1000+ lines)
Unrealistic timeframes
Scope without proper planning
Patterns:
Unsupported Claims
Detects:
Absolute statements without hedging
Statistical claims without sources
Over-confident predictions
Missing citations
Patterns:
Risk Levels
Calculated based on weighted confidence scores:
LOW: Minor issues, no immediate intervention needed
MODERATE: Worth noting, consider additional verification
HIGH: Significant concern, interventions recommended
CRITICAL: Serious risk, multiple interventions strongly advised
Intervention Types
Step Breakdown
Complex tasks should be broken into verifiable components.
Recommended for:
Math/physics speculation
Large coding projects
Dangerous file operations
Human-in-the-Loop
Critical decisions require human oversight.
Recommended for:
Medical advice
Destructive file operations
High-severity issues
Web Search
Claims should be verified against authoritative sources.
Recommended for:
Medical recommendations
Physics/math theories
Unsupported factual claims
Simplified Scope
Overly ambitious projects need realistic scoping.
Recommended for:
Vibe coding requests
Complex system designs
Feature-heavy applications
Configuration
Character Limit
Default: 25,000 characters per response
Taxonomy Capacity
Default: 1,000 evidence entries
Detection Sensitivity
Adjust pattern matching and confidence thresholds in detection functions:
Integration Examples
Claude Desktop App
Add to your claude_desktop_config.json
:
CLI Testing
Programmatic Usage
Architecture
Design Principles
Privacy First: No external API calls, all processing local
Deterministic: Heuristic-based detection for reproducibility
Low Latency: Fast pattern matching for real-time use
Extensible: Easy to add new patterns and categories
Human-Centered: Always allows human override and judgment
Future Enhancements
The system is designed for progressive enhancement:
Phase 1 (Current): Heuristic pattern matching
Phase 2 (Planned): Traditional ML models (clustering, anomaly detection)
Phase 3 (Future): Federated learning from submitted evidence
Phase 4 (Advanced): Custom fine-tuned models for specific domains
Data Flow
Contributing
Adding New Detection Patterns
Create a new detection function:
Add to CategoryType enum
Update analysis functions to include new detector
Add intervention recommendations if needed
Submitting Evidence
Use the togmal_submit_evidence
tool to contribute examples of problematic LLM behavior. This helps improve detection for everyone.
Limitations
Current Constraints
Heuristic-Based: May have false positives/negatives
English-Only: Patterns optimized for English text
Context-Free: Doesn't understand full conversation history
No Learning: Detection rules are static until updated
Not a Replacement For
Professional judgment in critical domains (medicine, law, etc.)
Comprehensive code review
Security auditing
Safety testing in production systems
License
MIT License - See LICENSE file for details
Support
For issues, questions, or contributions:
Open an issue on GitHub
Submit evidence through the MCP tool
Contact: [Your contact information]
Citation
If you use ToGMAL in your research or product, please cite:
Acknowledgments
Built using:
Inspired by the need for safer, more grounded AI interactions.
This server cannot be installed
local-only server
The server can only run on the client's local machine because it depends on local resources.
Provides real-time, privacy-preserving analysis of LLM interactions to detect problematic behaviors like medical advice, dangerous file operations, physics speculation, and unsupported claims. Recommends safety interventions and builds a taxonomy of LLM limitations through crowdsourced evidence collection.