mitigation_guardrails_policy
Automatically generate a guardrails policy using the top 20 categories from redteam results summary. Enhances AI safety by mitigating risks identified in red-team testing.
Instructions
Create a guardrails policy by using the redteam results summary.
Args: redteam_results_summary: A dictionary containing only the top 20 categories of the redteam results summary in terms of success percent (retrieve using get_redteam_task_results_summary tool). NOTE: If there are more than 20 items in category array, only pass the top 20 categories with the highest success percent. Format: { "category": [ { "Bias": { "total": 6, "test_type": "adv_info_test", "success(%)": 66.67 } }, contd. ] }
Returns: A dictionary containing the response message and details of the created guardrails policy.
After getting the configuration, create the guardrails policy using the add_guardrails_policy tool.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| redteam_results_summary | Yes |
Input Schema (JSON Schema)
Implementation Reference
- src/mcp_server.py:798-830 (handler)The main handler function for the MCP tool 'mitigation_guardrails_policy'. It is registered via the @mcp.tool() decorator. Takes a redteam results summary dictionary, sends it to the redteam_client to generate a guardrails policy configuration, and returns the response.def mitigation_guardrails_policy(redteam_results_summary: Dict[str, Any]) -> Dict[str, Any]: """ Create a guardrails policy by using the redteam results summary. Args: redteam_results_summary: A dictionary containing only the top 20 categories of the redteam results summary in terms of success percent (retrieve using get_redteam_task_results_summary tool). NOTE: If there are more than 20 items in category array, only pass the top 20 categories with the highest success percent. Format: { "category": [ { "Bias": { "total": 6, "test_type": "adv_info_test", "success(%)": 66.67 } }, contd. ] } Returns: A dictionary containing the response message and details of the created guardrails policy. After getting the configuration, create the guardrails policy using the add_guardrails_policy tool. """ config = { "redteam_summary": redteam_results_summary } # Create the guardrails policy using the provided configuration mitigation_guardrails_policy_response = redteam_client.risk_mitigation_guardrails_policy(config=config) # Return the response as a dictionary return mitigation_guardrails_policy_response.to_dict()
- src/mcp_server.py:798-798 (registration)The @mcp.tool() decorator registers the mitigation_guardrails_policy function as an MCP tool.def mitigation_guardrails_policy(redteam_results_summary: Dict[str, Any]) -> Dict[str, Any]:
- src/mcp_server.py:803-816 (schema)Docstring description of the input schema/format for the tool.redteam_results_summary: A dictionary containing only the top 20 categories of the redteam results summary in terms of success percent (retrieve using get_redteam_task_results_summary tool). NOTE: If there are more than 20 items in category array, only pass the top 20 categories with the highest success percent. Format: { "category": [ { "Bias": { "total": 6, "test_type": "adv_info_test", "success(%)": 66.67 } }, contd. ] }