check_message_safety
Analyze user messages to detect self-harm or criminal intent. Returns safety classifications, risk scores, trigger keywords, and escalation recommendations for proactive chat platform protection.
Instructions
Classify a message for self-harm or criminal intent.
Parameters
message: The user message to classify. session_id: Optional session identifier for trajectory tracking.
Returns
dict
{"safe": bool, "category": str, "score": float, "triggers": list,
"stage_reached": int, "should_escalate": bool, ...}
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| message | Yes | ||
| session_id | No | mcp-default |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||