Skip to main content
Glama
x0base

mcp-security-toolkit

owasp_llm_classify

Map security findings or observations to OWASP LLM Top 10 (2025) categories with rule-based keyword and regex matching, returning top matches with evidence snippets and confidence scores.

Instructions

Map a finding or observation to OWASP LLM Top 10 (2025) categories.

Pure rule-based: keyword and regex patterns with weights per category. Returns the top top_n matching categories with the matched evidence snippets and a confidence score.

Args: observation: Free-form text describing a finding, scan result, bug report, threat model entry, or security observation. top_n: Number of matches to return (default 3).

Returns: ClassifyReport with ranked matches. If nothing matches, unmatched is True and matches is empty.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
observationYes
top_nNo

Implementation Reference

  • The main tool function `owasp_llm_classify(observation, top_n=3)` that classifies a security observation against OWASP LLM Top 10 (2025) categories. Uses rule-based keyword/regex matching with weighted patterns per category, returns ranked matches with evidence snippets.
    def owasp_llm_classify(observation: str, top_n: int = 3) -> dict:
        """Map a finding or observation to OWASP LLM Top 10 (2025) categories.
    
        Pure rule-based: keyword and regex patterns with weights per category.
        Returns the top `top_n` matching categories with the matched evidence
        snippets and a confidence score.
    
        Args:
            observation: Free-form text describing a finding, scan result, bug
                report, threat model entry, or security observation.
            top_n: Number of matches to return (default 3).
    
        Returns:
            ClassifyReport with ranked matches. If nothing matches, `unmatched`
            is True and `matches` is empty.
        """
        if not isinstance(observation, str):
            return {"error": "observation must be a string"}
    
        report = ClassifyReport(input_chars=len(observation), top_match=None)
        raw_scores: list[Match] = []
    
        for cat_id, cat in CATEGORIES.items():
            score = 0
            evidence: list[str] = []
            for pattern, weight in cat["patterns"]:
                matches = list(re.finditer(pattern, observation, re.IGNORECASE))
                if matches:
                    score += weight * len(matches)
                    for m in matches[:2]:
                        snippet = observation[max(0, m.start() - 20):m.end() + 20].strip()
                        snippet = re.sub(r"\s+", " ", snippet)
                        evidence.append(snippet)
            if score > 0:
                raw_scores.append(Match(
                    category=cat_id,
                    title=cat["title"],
                    score=score,
                    severity=_severity_from_score(score),
                    evidence=evidence,
                    description=cat["description"],
                ))
    
        raw_scores.sort(key=lambda m: m.score, reverse=True)
        report.matches = raw_scores[:max(1, top_n)]
        if not raw_scores:
            report.unmatched = True
        else:
            report.top_match = raw_scores[0].category
    
        return report.model_dump()
  • The CATEGORIES dictionary defining all 10 OWASP LLM categories (LLM01-LLM10) with title, description, and weighted regex patterns for rule-based classification.
    CATEGORIES: dict[str, dict] = {
        "LLM01": {
            "title": "Prompt Injection",
            "description": "Adversarial input that manipulates an LLM into unintended actions or output.",
            "patterns": [
                (r"\bprompt injection\b", 5),
                (r"\bindirect prompt injection\b", 5),
                (r"\b(ignore|disregard) (the )?(previous|prior|above|all) instructions?\b", 4),
                (r"\bjailbreak\b", 3),
                (r"\bsystem prompt (override|bypass)\b", 4),
                (r"\binstruction[- ]override\b", 3),
                (r"\buntrusted (content|input|document)\b", 2),
                (r"\bmalicious prompt\b", 3),
                (r"\bmissing delimiter\b", 2),
                (r"\btrust[- ]boundary (violation|crossing)\b", 3),
            ],
        },
        "LLM02": {
            "title": "Sensitive Information Disclosure",
            "description": "Model exposes sensitive data: PII, secrets, internal documents, training data leakage.",
            "patterns": [
                (r"\bPII\b", 4),
                (r"\bpersonally identifiable\b", 4),
                (r"\b(api[- ]?key|api token|access token|bearer token)\b", 4),
                (r"\b(secret|password|credential)s?\b leaked", 4),
                (r"\b(secret|password|credential)s?\b (in|inside) (response|output|completion)", 4),
                (r"\btraining data (leak|extraction|memoriz)", 4),
                (r"\bdata exfiltration\b", 3),
                (r"\bredact(ion|ed)?\b", 2),
                (r"\b(SSN|social security)\b", 4),
                (r"\b(credit card|PAN|PCI)\b", 3),
            ],
        },
        "LLM03": {
            "title": "Supply Chain",
            "description": "Compromise of components in the LLM supply chain: models, datasets, plugins, third-party libs.",
            "patterns": [
                (r"\b(malicious|compromised|backdoored) (model|package|dependency|plugin)\b", 5),
                (r"\bmodel (origin|provenance)\b", 3),
                (r"\btypo[- ]?squatt", 4),
                (r"\bdependency confusion\b", 4),
                (r"\b(unsigned|untrusted) (model|weights|checkpoint)\b", 4),
                (r"\bhuggingface (repo|model) (pinned|unpinned|hijack)", 3),
                (r"\bsbom\b", 2),
            ],
        },
        "LLM04": {
            "title": "Data and Model Poisoning",
            "description": "Adversarial manipulation of training/fine-tune data or model weights.",
            "patterns": [
                (r"\b(data|training|dataset) poison", 5),
                (r"\bmodel poison", 5),
                (r"\bbackdoor(ed)? (model|weights)\b", 4),
                (r"\btrigger phrase\b", 3),
                (r"\bfine[- ]?tune.*poison", 4),
                (r"\bRAG poisoning\b", 4),
                (r"\bcorpus contamination\b", 3),
            ],
        },
        "LLM05": {
            "title": "Improper Output Handling",
            "description": "Downstream system blindly trusts LLM output (XSS, SSRF, SQLi, SSTI, command injection from generation).",
            "patterns": [
                (r"\bXSS\b", 4),
                (r"\bcross[- ]site scripting\b", 4),
                (r"\bSSRF\b", 4),
                (r"\bSQL injection\b", 4),
                (r"\bSSTI\b", 4),
                (r"\b(command|code) injection (from|via) (llm|model|generation|output)\b", 4),
                (r"\brendered (unsafely|without (sanitization|escaping))\b", 3),
                (r"\b(eval|exec)\(.*(llm|model|completion|response)", 4),
                (r"\bunescaped (html|markdown|sql)\b", 3),
                (r"\boutput (sanitization|validation|filtering) missing\b", 3),
            ],
        },
        "LLM06": {
            "title": "Excessive Agency",
            "description": "Agent given too much functionality, permission, or autonomy — can take harmful actions.",
            "patterns": [
                (r"\bover[- ]broad (param|permission|scope|tool)\b", 4),
                (r"\b(shell|command) execution (from|via) (agent|tool)\b", 5),
                (r"\bagent (deletes?|writes?|sends?|posts?) (without|with no) (confirmation|approval|human)\b", 4),
                (r"\bunbounded (tool|action) (set|capability)\b", 4),
                (r"\b(arbitrary|unrestricted) (filesystem|network|database) access\b", 4),
                (r"\bagent (auto|automatically) (executes?|runs?)\b", 3),
                (r"\bdisabled (verification|safeguard|confirm)\b", 3),
                (r"\bexcessive (privileges?|permissions?|agency)\b", 5),
            ],
        },
        "LLM07": {
            "title": "System Prompt Leakage",
            "description": "System prompt or its contents (rules, secrets, tool names) revealed to the user.",
            "patterns": [
                (r"\bsystem prompt (leak|disclosure|extraction|exposure)\b", 5),
                (r"\b(print|reveal|repeat|dump|show) (your |the )?system prompt\b", 4),
                (r"\bprompt extraction\b", 4),
                (r"\binternal instructions revealed\b", 3),
                (r"\bsecret in system prompt\b", 4),
            ],
        },
        "LLM08": {
            "title": "Vector and Embedding Weaknesses",
            "description": "Weak access controls, poisoning, or inversion in vector stores / embeddings used by RAG.",
            "patterns": [
                (r"\b(vector|embedding) (store|database)\b", 2),
                (r"\bembedding (inversion|reconstruction)\b", 5),
                (r"\bvector store poison", 5),
                (r"\bRAG (poisoning|context injection)\b", 4),
                (r"\bcross[- ]tenant (vector|embedding|context)\b", 4),
                (r"\bweak (access control|isolation).*vector\b", 3),
            ],
        },
        "LLM09": {
            "title": "Misinformation",
            "description": "Hallucinated or biased model output presented as fact, leading to downstream errors.",
            "patterns": [
                (r"\bhallucinat(ion|ed|es)\b", 4),
                (r"\bconfabulat", 3),
                (r"\bfactually (incorrect|wrong)\b", 3),
                (r"\bfabricated (citation|source|reference|fact)\b", 4),
                (r"\b(bias|biased) (output|generation|completion)\b", 3),
                (r"\bunsourced claim\b", 2),
                (r"\bmodel (confidently )?(wrong|incorrect)\b", 3),
            ],
        },
        "LLM10": {
            "title": "Unbounded Consumption",
            "description": "Lack of rate limits, token caps, or cost controls — DoS or unbounded billing.",
            "patterns": [
                (r"\bDoS\b", 3),
                (r"\bdenial[- ]of[- ]service\b", 3),
                (r"\b(no|missing|absent) (rate limit|token limit|cost cap|budget)\b", 4),
                (r"\bunbounded (token|cost|generation|prompt)\b", 4),
                (r"\b(prompt|input) length unbounded\b", 3),
                (r"\b(infinite|runaway) loop\b", 3),
                (r"\b(resource|cost) exhaustion\b", 4),
                (r"\bbilling abuse\b", 3),
            ],
        },
    }
  • Pydantic models: Match (category, title, score, severity, evidence, description) and ClassifyReport (input_chars, top_match, matches, unmatched) used for typed output.
    class Match(BaseModel):
        category: str
        title: str
        score: int
        severity: Severity
        evidence: list[str] = Field(default_factory=list)
        description: str
    
    
    class ClassifyReport(BaseModel):
        input_chars: int
        top_match: str | None
        matches: list[Match] = Field(default_factory=list)
        unmatched: bool = False
  • Helper function `_severity_from_score(score)` that maps numeric scores to severity levels (high >= 8, medium >= 4, low >= 2, info otherwise).
    def _severity_from_score(score: int) -> Severity:
        if score >= 8:
            return "high"
        if score >= 4:
            return "medium"
        if score >= 2:
            return "low"
        return "info"
  • Import of owasp_llm_classify module in the MCP server.
    owasp_llm_classify,
  • Registration of `owasp_llm_classify.owasp_llm_classify` as an MCP tool via `mcp.tool()` decorator.
    mcp.tool()(owasp_llm_classify.owasp_llm_classify)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses the method as 'pure rule-based: keyword and regex patterns with weights per category' and explains the return structure (top_n matches, unmatched flag). No annotations exist, so the description fully handles the burden. Lacks potential caveats like sensitivity to phrasing, but overall strong.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise with no wasted words: purpose, method, parameter descriptions, and return value are all covered in a few sentences. The structure is logical and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite lacking an output schema, the description fully explains the return value (ClassifyReport with ranked matches or unmatched flag). The tool is simple (2 parameters) and the description covers all necessary aspects for an agent to use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, the description adds meaning: 'observation' is detailed as free-form text for various security artifacts, and 'top_n' is explained as number of matches (default 3). While not exhaustive, it compensates for the schema gap effectively.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool maps findings to OWASP LLM Top 10 categories, specifying the verb (map) and resource (OWASP LLM Top 10). It distinguishes from siblings like agent_tool_risk_audit or graphql_introspect by focusing on LLM security categorization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit examples of valid inputs (finding, scan result, bug report, etc.) but does not mention when not to use or offer alternative tools. Given sibling tools are mostly other security utilities, the context is clear enough, but lacking exclusions prevents a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/x0base/mcp-security-toolkit'

If you have feedback or need assistance with the MCP directory API, please join our Discord server