classifier_robustness
Evaluate censorship classifier robustness by analyzing performance under perturbed and out-of-distribution inputs. Identify failure modes to make informed decisions before relying on scores.
Instructions
Adversarial-bench results for the live censorship classifier — performance under perturbed / out-of-distribution inputs. Use to understand failure modes before relying on a classifier score.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||