content-moderation
Classify text and image URLs for harmful content like hate speech, harassment, and violence. Returns risk levels, flagged categories, and an optional safe rewrite.
Instructions
Classify text (and optional image URLs) for harmful content — hate speech, harassment, self-harm, sexual content, violence, and illicit instructions. Returns flagged status, risk level (NONE/LOW/MEDIUM/HIGH), flagged categories, per-category confidence scores, and an optional AI-generated safe rewrite.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| text | No | Text content to moderate (required unless image_url provided). | |
| image_url | No | Optional public image URL to moderate alongside text. | |
| rewrite | No | If true and content is flagged, return an AI-generated safe rewrite. Adds ~1s latency. |