MCP Chat Support System

MCP
rag-backend

HALLUCINATION_FIX_SUMMARY.md•5.7 KiB

# Hallucination Prevention Fixes - Implementation Summary ## Problem Identified The "Hallucination Refusal (Out of Scope)" test was incorrectly passing. The system was generating answers with citations for out-of-scope queries (e.g., "How to integrate ClientSphere with Shopify?") even though the knowledge base doesn't contain integration information. This is dangerous as it creates "citation-backed hallucinations." ## Solution Implemented ### 1. Intent Detection Module (`app/rag/intent.py`) **NEW FILE CREATED** - Detects user intent from queries (integration, billing, account, password_reset, pricing, general) - Provides keyword-based intent matching - Implements `check_direct_match()` to verify retrieved chunks contain intent-relevant keywords **Key Features:** - Intent keywords mapping for common query types - Word boundary matching for accuracy - Direct match checking for integration/API questions ### 2. Configuration Updates (`app/config.py`) **CHANGES:** - Added `SIMILARITY_THRESHOLD_STRICT = 0.45` - Strict threshold for answer generation - Added `REQUIRE_VERIFIER = True` - Makes verifier mandatory - Reduced `MAX_CONTEXT_TOKENS` from 3000 to 2500 for better focus - Added `JWT_SECRET` field for authentication ### 3. Retrieval Gating (`app/rag/retrieval.py`) **CHANGES:** - Added direct match gating: Checks if retrieved chunks contain intent keywords - For integration/API questions: Requires strict direct match - For other questions: More lenient (allows high confidence to bypass) - Only considers results "relevant" if they pass both similarity threshold AND direct match **Key Logic:** ```python # For integration/API: strict direct match required # For others: direct match OR high confidence (>0.40) has_relevant = len(filtered_results) > 0 and (has_direct_match or avg_confidence > 0.40) ``` ### 4. Answer Generation Gating (`app/rag/answer.py`) **MAJOR CHANGES:** **Gate 1: No Relevant Results** - If `has_relevant_results = False` → REFUSE immediately **Gate 2: Strict Confidence Threshold** - If `confidence < 0.45` → REFUSE (prevents low-quality answers) **Gate 3: Intent-Based Gating** - For integration/API questions: Requires `confidence >= 0.50` - Otherwise → REFUSE **Gate 4: Mandatory Verifier** - Verifier is now ALWAYS used (no optional mode) - Draft → Verify → Final flow - If verifier fails → REFUSE with explanation **Response Structure:** - Added `refused` flag to response - Added `refusal_reason` for debugging - Added `verifier_passed` flag ### 5. Prompt Improvements (`app/rag/prompts.py`) **CHANGES:** - Enhanced `DRAFT_PROMPT_SYSTEM` with 7 strict anti-hallucination rules - Emphasizes: "If context doesn't contain answer → MUST say 'I couldn't find this information'" - Explicitly forbids general knowledge usage ### 6. Test Script Fixes (`scripts/validate_rag.py`) **CHANGES:** - Updated `test_chat()` to check `metadata.refused` flag - Multiple refusal checks: 1. If citations exist → FAIL (should have refused) 2. If confidence >= 0.30 and answer exists → FAIL 3. If `refused=False` and no refusal keywords → FAIL - Prints full answer on failure for debugging **Test Query:** - "How to integrate ClientSphere with Shopify?" → Should REFUSE ## Files Changed 1. ✅ **NEW:** `app/rag/intent.py` - Intent detection module 2. ✅ `app/config.py` - Added strict thresholds and verifier requirement 3. ✅ `app/rag/retrieval.py` - Added direct match gating 4. ✅ `app/rag/answer.py` - Implemented 4-layer gating + mandatory verifier 5. ✅ `app/rag/prompts.py` - Enhanced draft prompt 6. ✅ `app/main.py` - Updated to pass refusal metadata 7. ✅ `scripts/validate_rag.py` - Fixed refusal detection logic ## Expected Behavior After Fixes ### For Out-of-Scope Query: "How to integrate ClientSphere with Shopify?" **Before Fix:** - ❌ Generated answer with 3 citations - ❌ Confidence: 0.44 - ❌ Test passed incorrectly **After Fix:** - ✅ Detects "integration" intent - ✅ Checks for direct match in chunks (fails) - ✅ Confidence check: 0.44 < 0.45 → REFUSE - ✅ OR: Integration intent requires 0.50 → REFUSE - ✅ Verifier would also fail if draft generated - ✅ Returns: "I couldn't find this information in the knowledge base..." - ✅ `refused=True` in metadata - ✅ Test correctly FAILS if answer generated ## Gating Flow Diagram ``` Query → Intent Detection ↓ Retrieval (with tenant filter) ↓ Gate 1: Has relevant results? → NO → REFUSE ↓ YES Gate 2: Confidence >= 0.45? → NO → REFUSE ↓ YES Gate 3: Integration intent? → YES → Confidence >= 0.50? → NO → REFUSE ↓ YES Gate 4: Direct match in chunks? → NO → REFUSE (for integration) ↓ YES Generate Draft Answer ↓ Verifier Check ↓ Gate 5: Verifier PASS? → NO → REFUSE ↓ YES Return Final Answer with Citations ``` ## Testing To test the fixes: 1. Start the server: ```bash cd rag-backend .\venv\Scripts\Activate.ps1 uvicorn app.main:app --reload --port 8000 ``` 2. Run validation: ```bash python scripts/validate_rag.py ``` 3. Expected result: - ✅ Retrieval tests: 4/4 PASS - ✅ Chat tests (in-scope): 4/4 PASS - ✅ **Hallucination Refusal: MUST FAIL if answer generated** - ✅ Citation Integrity: 1/1 PASS ## Critical Notes 1. **Verifier is now MANDATORY** - Cannot be disabled 2. **Strict threshold is 0.45** - Answers below this are refused 3. **Integration questions require 0.50** - Even stricter 4. **Direct match required for integration** - Prevents loosely relevant chunks 5. **All refusals include `refused=True`** - For test validation ## Next Steps 1. Run the test suite to verify all fixes work 2. Monitor logs for refusal reasons 3. Adjust thresholds if needed based on real-world performance 4. Consider adding more intent types as needed

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ChiragPatankar/MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

HALLUCINATION_FIX_SUMMARY.md•5.7 KiB