detect_bias
Analyze AI-generated text to identify demographic bias, stereotyping, and unfair language patterns for compliance auditing.
Instructions
Analyze text for demographic bias patterns, stereotyping, and unfair language.
Args: model_output: The AI-generated text to analyze for bias. protected_attributes: Comma-separated list of attributes to check (e.g. "race,gender,age"). Leave empty for auto-detection. api_key: Optional MEOK API key for pro tier.
Behavior: This tool generates structured output without modifying external systems. Output is deterministic for identical inputs. No side effects. Free tier: 10/day rate limit. Pro tier: unlimited. No authentication required for basic usage.
When to use: Use this tool when you need to assess, audit, or verify compliance requirements. Ideal for gap analysis, readiness checks, and generating compliance documentation.
When NOT to use: Do not use as a substitute for qualified legal counsel. This tool provides technical compliance guidance, not legal advice.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| model_output | Yes | ||
| protected_attributes | No | ||
| api_key | No |
Implementation Reference
- server.py:435-543 (handler)The main tool handler for detect_bias. Accepts model_output, protected_attributes, and api_key. Scores the text for bias patterns, detects protected attributes, identifies bias types, performs sentence-level analysis, and returns a structured result with overall risk level, flagged sentences, and recommendations.
@mcp.tool() def detect_bias( model_output: str, protected_attributes: str = "", api_key: str = "", ) -> dict: """Analyze text for demographic bias patterns, stereotyping, and unfair language. Args: model_output: The AI-generated text to analyze for bias. protected_attributes: Comma-separated list of attributes to check (e.g. "race,gender,age"). Leave empty for auto-detection. api_key: Optional MEOK API key for pro tier. Behavior: This tool generates structured output without modifying external systems. Output is deterministic for identical inputs. No side effects. Free tier: 10/day rate limit. Pro tier: unlimited. No authentication required for basic usage. When to use: Use this tool when you need to assess, audit, or verify compliance requirements. Ideal for gap analysis, readiness checks, and generating compliance documentation. When NOT to use: Do not use as a substitute for qualified legal counsel. This tool provides technical compliance guidance, not legal advice. """ allowed, msg, tier = check_access(api_key) if not allowed: return {"error": msg, "upgrade_url": "https://meok.ai/pricing"} limit_err = _check_rate_limit("detect_bias", tier) if limit_err: return {"error": "rate_limited", "message": limit_err} # Score for bias patterns bias_score, pattern_matches = _score_bias_risk(model_output) # Detect protected attributes mentioned auto_detected = _detect_protected_attributes(model_output) # If user specified attributes, filter/augment requested_attrs = [] # type: List[str] if protected_attributes: requested_attrs = [a.strip().lower() for a in protected_attributes.split(",")] # Identify specific bias types present detected_bias_types = [] # type: List[Dict[str, str]] for btype, binfo in BIAS_TYPES.items(): matched = _match_keywords(model_output, binfo["indicators"]) if matched: detected_bias_types.append({ "type": binfo["name"], "severity": binfo["severity"], "matched_indicators": matched, "eu_article": binfo["eu_article"], }) # Classify overall risk if bias_score >= 0.7: overall_risk = "high" recommendation = ( "CRITICAL: High bias detected. This output should not be used for decisions " "affecting individuals without significant human review and debiasing." ) elif bias_score >= 0.4: overall_risk = "moderate" recommendation = ( "WARNING: Moderate bias patterns detected. Review flagged patterns and consider " "rephrasing or adding qualifying context before deployment." ) elif bias_score >= 0.15: overall_risk = "low" recommendation = ( "Minor bias indicators detected. Generally acceptable but review flagged " "patterns for context appropriateness." ) else: overall_risk = "minimal" recommendation = ( "No significant bias patterns detected in this text. Continue monitoring " "outputs for emergent patterns." ) # Sentence-level analysis sentences = [s.strip() for s in re.split(r'[.!?]+', model_output) if s.strip()] flagged_sentences = [] # type: List[Dict[str, object]] for sentence in sentences: s_score, s_matches = _score_bias_risk(sentence) if s_score > 0.15: flagged_sentences.append({ "sentence": sentence, "bias_score": round(s_score, 2), "patterns": [m["category"] for m in s_matches], }) return { "overall_bias_risk": overall_risk, "bias_score": round(bias_score, 2), "pattern_matches": pattern_matches, "protected_attributes_mentioned": auto_detected, "bias_types_detected": detected_bias_types, "flagged_sentences": flagged_sentences[:10], "recommendation": recommendation, "total_sentences_analyzed": len(sentences), "sentences_flagged": len(flagged_sentences), "next_step": "Use mitigation_recommendations for remediation or fairness_metrics for quantitative assessment", "meok_labs": "https://meok.ai", } - server.py:435-435 (registration)Registration of detect_bias as an MCP tool via the @mcp.tool() decorator on the FastMCP instance 'mcp'.
@mcp.tool() - server.py:237-269 (helper)Helper function that scores text for bias patterns using MANIPULATION_PATTERNS regex patterns and protected attribute keyword detection. Returns a score normalized to 0-1 and the list of matched patterns.
def _score_bias_risk(text): # type: (str) -> Tuple[float, List[Dict[str, str]]] """Score text for bias patterns. Returns (score 0-1, matched_patterns).""" text_lower = text.lower() total_weight = 0.0 matches = [] # type: List[Dict[str, str]] seen_categories = set() # type: set for pat in MANIPULATION_PATTERNS: if re.search(pat["pattern"], text_lower): total_weight += pat["weight"] if pat["category"] not in seen_categories: seen_categories.add(pat["category"]) matches.append({ "pattern": pat["pattern"], "category": pat["category"], "weight": str(pat["weight"]), }) # Check for protected attribute mentions without fairness context fairness_terms = ["fair", "equit", "bias", "discriminat", "parity", "equal"] has_fairness_context = any(ft in text_lower for ft in fairness_terms) protected_mentioned = [] # type: List[str] for attr, info in PROTECTED_ATTRIBUTES_DB.items(): if _match_keywords(text, info["keywords"]): protected_mentioned.append(attr) if not has_fairness_context: total_weight += 0.15 # Normalise to 0-1 score = min(1.0, total_weight / 2.5) return score, matches - server.py:272-284 (helper)Helper function that detects protected attributes (race, gender, age, disability, religion, etc.) mentioned in text by matching keywords from PROTECTED_ATTRIBUTES_DB.
def _detect_protected_attributes(text): # type: (str) -> List[Dict[str, object]] """Detect mentions of protected attributes in text.""" found = [] # type: List[Dict[str, object]] for attr, info in PROTECTED_ATTRIBUTES_DB.items(): matched = _match_keywords(text, info["keywords"]) if matched: found.append({ "attribute": attr, "matched_terms": matched, "eu_reference": info["eu_ref"], }) return found - server.py:230-234 (helper)Helper that performs case-insensitive keyword matching used by both _score_bias_risk and _detect_protected_attributes.
def _match_keywords(text, keywords): # type: (str, List[str]) -> List[str] """Return matched keywords found in text (case-insensitive).""" text_lower = text.lower() return [kw for kw in keywords if kw.lower() in text_lower]