pdfhell_run
Run the pdfhell adversarial-PDF benchmark to evaluate a vision model's robustness against malicious PDFs. Returns pass rate and per-family statistics.
Instructions
Run the pdfhell adversarial-PDF benchmark against a vision model.
Args:
model: Provider:model spec, e.g. "anthropic:claude-sonnet-4-6",
"openai:gpt-4o", "google:gemini-2.5-flash".
suite: "smoke" (3 cases, ~10s) or "mini" (30 cases, ~$0.01
on Flash). Default "mini".
workers: Parallel API requests. Default 4.
Returns:
A dict with overall pass_rate, Wilson 95% CI, per-trap-family
pass rates and CIs, and per-case details. Suite version + hash
included so consumers can verify the run measured the expected
cases.
Provider API keys come from environment variables
(ANTHROPIC_API_KEY, OPENAI_API_KEY, GOOGLE_API_KEY) — not
passed through this tool, never logged.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| model | Yes | ||
| suite | No | mini | |
| workers | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||