eval_classifier_robustness
Test the robustness of AI safety classifiers by processing queries through decomposition, obfuscation, and multi-agent attack simulations, returning detailed analysis and metrics.
Instructions
Process a query through the configured evaluation stages.
Returns a formatted analysis with per-segment routing information, aggregate metrics, and processing results.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| topic | No | computer security | |
| obfuscation | No | moderate | |
| framing | No | study_guide | |
| max_pieces | No | ||
| padding_tokens | No | ||
| multi_agent | No | ||
| helper_enabled | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |