add_guardrails_policy
Add a guardrails policy to enforce AI safety measures by configuring detectors for injection attacks, PII, NSFW content, toxicity, bias, and more. Define custom settings for each detector to ensure compliance with specified security and content guidelines.
Instructions
Add a new guardrails policy.
Args: policy_name: The name of the policy to add. detectors: detectors_config: Dictionary of detector configurations. Each key should be the name of a detector, and the value should be a dictionary of settings for that detector. Available detectors and their configurations are as follows:
- injection_attack: Configured using InjectionAttackDetector model. Example: {"enabled": True}
- pii: Configured using PiiDetector model. Example: {"enabled": False, "entities": ["email", "phone"]}
- nsfw: Configured using NsfwDetector model. Example: {"enabled": True}
- toxicity: Configured using ToxicityDetector model. Example: {"enabled": True}
- topic: Configured using TopicDetector model. Example: {"enabled": True, "topic": ["politics", "religion"]}
- keyword: Configured using KeywordDetector model. Example: {"enabled": True, "banned_keywords": ["banned_word1", "banned_word2"]}
- policy_violation: Configured using PolicyViolationDetector model. Example: {"enabled": True, "need_explanation": True, "policy_text": "Your policy text here"}
- bias: Configured using BiasDetector model. Example: {"enabled": True}
- copyright_ip: Configured using CopyrightIpDetector model. Example: {"enabled": True}
- system_prompt: Configured using SystemPromptDetector model. Example: {"enabled": True, "index": "system_prompt_index"}
Example usage:
{
"injection_attack": {"enabled": True},
"nsfw": {"enabled": True}
}Returns: A dictionary containing the response message and policy details.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| detectors | Yes | ||
| policy_description | Yes | ||
| policy_name | Yes |