moderate_content
Check if content is safe to publish or post. Returns decision, risk level, and violation categories with reasons.
Instructions
Decide whether CONTENT is safe to publish, post, or surface.
Use before an agent sends or publishes generated content. Optional policy
sets the standard; otherwise a conservative default-safe baseline is applied.
Returns: decision (publish | review | block), violation_risk, categories, and reasons.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| policy | No | ||
| content | Yes |