scan_response
Scans LLM-generated responses for security threats including prompt leaks, unexpected PII, toxic content, topic drift, and policy violations to ensure safe AI interactions.
Instructions
Scans an LLM-generated response before showing it to the user.
Detects:
System prompt leaks (LLM revealing its instructions)
Unexpected PII in output (PII not present in the original prompt)
Toxic or hostile language in generated content
Topic drift (response diverges from prompt intent)
Policy violations in generated content
Provide the original_prompt for best results — it enables PII diff analysis and topic mismatch detection.
When pii_tokens is provided (from scan_prompt with redact_pii=true), the response is rehydrated after scanning. Tokens like [EMAIL_1] are replaced with the original values. The rehydrated text is returned as rehydrated_response.
Returns:
blocked: true/false
threat_type: category of threat detected
severity/confidence/guidance: security assessment details
rehydrated_response: (when pii_tokens provided and response is safe) text with PII restored
request_id: unique identifier
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| response | Yes | The LLM-generated response to scan for security threats | |
| original_prompt | No | The original prompt that generated this response. Enables PII diff and topic mismatch detection. | |
| pii_tokens | No | PII token map from scan_prompt(redact_pii=true). When provided, tokens in the response are rehydrated with original values after scanning. |