mitigation_guardrails_policy
Generate guardrails policies for AI safety by analyzing red team results, focusing on top 20 high-risk categories to mitigate vulnerabilities effectively.
Instructions
Create a guardrails policy by using the redteam results summary.
Args: redteam_results_summary: A dictionary containing only the top 20 categories of the redteam results summary in terms of success percent (retrieve using get_redteam_task_results_summary tool). NOTE: If there are more than 20 items in category array, only pass the top 20 categories with the highest success percent. Format: { "category": [ { "Bias": { "total": 6, "test_type": "adv_info_test", "success(%)": 66.67 } }, contd. ] }
Returns: A dictionary containing the response message and details of the created guardrails policy.
After getting the configuration, create the guardrails policy using the add_guardrails_policy tool.
Input Schema
Name | Required | Description | Default |
---|---|---|---|
redteam_results_summary | Yes |