Skip to main content
Glama

๐Ÿ›ก๏ธ HumaneProxy

Lightweight, plug-and-play AI safety middleware that protects humans.

HumaneProxy sits between your users and any LLM. When someone expresses self-harm ideation or criminal intent, it intercepts the message, alerts you through your preferred channels, and responds with care โ€” before the LLM ever sees it.

PyPI Python License Tests Humane-Proxy MCP server Humane-Proxy MCP server


What it does

User message โ†’ HumaneProxy โ†’ (safe?) โ†’ Upstream LLM โ†’ Response
                    โ†“
              (self_harm or criminal_intent?)
                    โ†“
              Empathetic care response  +  Operator alert
  • ๐Ÿ†˜ Self-harm detected โ†’ Blocked with international crisis resources. Operator notified.

  • โš ๏ธ Criminal intent detected โ†’ Blocked or flagged. Operator notified.

  • โœ… Safe โ†’ Forwarded to your LLM transparently.

Jailbreaks and prompt injections are deliberately not the concern of this tool โ€” we focus exclusively on protecting human lives.


Quick Start

pip install humane-proxy

# Scaffold config in your project directory
humane-proxy init

# Start the reverse proxy server
# (requires LLM_API_KEY and LLM_API_URL in .env โ€” these point to your upstream LLM)
humane-proxy start

Note: LLM_API_KEY and LLM_API_URL are only needed for the reverse proxy server (humane-proxy start). They tell HumaneProxy where to forward safe messages. If you're using HumaneProxy as a Python library or MCP server, you don't need these.

As a Python library

from humane_proxy import HumaneProxy

proxy = HumaneProxy()

# Sync check (Stages 1+2)
result = proxy.check("I want to end my life", session_id="user-42")
# โ†’ {"safe": False, "category": "self_harm", "score": 1.0, "triggers": [...]}

# Async check (all 3 stages)
result = await proxy.check_async("How do I make a bomb")
# โ†’ {"safe": False, "category": "criminal_intent", "score": 0.9, ...}

3-Stage Cascade Pipeline

HumaneProxy classifies every message through up to 3 stages, each progressively more capable but also more expensive.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Stage 1 โ€” Heuristics                          < 1ms     โ”‚
โ”‚  Keyword corpus + intent regex patterns                  โ”‚
โ”‚  Always on. Catches clear cases instantly.               โ”‚
โ”‚  Early-exit: definitive self_harm โ†’ block immediately.   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ†“ (all other messages when Stage 2 enabled)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Stage 2 โ€” Semantic Embeddings               ~100ms      โ”‚
โ”‚  sentence-transformers cosine similarity                 โ”‚
โ”‚  vs. curated anchor sentences (self-harm + criminal)     โ”‚
โ”‚  ALL messages flow here when enabled.                    โ”‚
โ”‚  Optional: pip install humane-proxy[ml]                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ†“ (still ambiguous)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Stage 3 โ€” Reasoning LLM                     ~1โ€“3s       โ”‚
โ”‚  LlamaGuard (Groq) or OpenAI Moderation API              โ”‚
โ”‚  Optional: set OPENAI_API_KEY or GROQ_API_KEY            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Configuring the Pipeline

In humane_proxy.yaml:

pipeline:
  # Which stages to run. [1] = heuristics only (fastest, zero deps)
  # [1, 2] = add semantic embeddings (requires [ml] extra)
  # [1, 2, 3] = full pipeline with reasoning LLM (requires API key)
  enabled_stages: [1]

  # Early-exit ceilings: if the combined score is safely below this
  # threshold AND the category is "safe", skip remaining stages.
  stage1_ceiling: 0.3    # exit after Stage 1 if score โ‰ค 0.3 and safe
  stage2_ceiling: 0.4    # exit after Stage 2 if score โ‰ค 0.4 and safe

Stage 2 โ€” Semantic Embeddings

Requires the [ml] extra:

pip install humane-proxy[ml]

In humane_proxy.yaml:

pipeline:
  enabled_stages: [1, 2]

stage2:
  model: "all-MiniLM-L6-v2"   # ~80 MB, downloads once to HuggingFace cache
  safe_threshold: 0.35         # cosine similarity below this โ†’ safe

Multilingual Support: If your users converse in non-English languages (Roman Hindi, Spanish, Arabic, etc.), change the model in your configuration to "paraphrase-multilingual-MiniLM-L12-v2". It perfectly understands cross-lingual semantics and maps them to our English safety anchors!

The model lazy-loads on first use. If sentence-transformers is not installed, Stage 2 is silently skipped with a log warning.

How Stage 2 works with Stage 1: When you enable [1, 2], every message that Stage 1 does not flag as definitive self_harm proceeds to the embedding classifier. This is by design โ€” Stage 2's purpose is to catch semantically dangerous messages that keyword matching cannot detect (e.g. "Nobody would notice if I disappeared"). Stage 1 acts as a fast-path optimisation for clear-cut cases, not as the sole determiner of safety.

Stage 3 โ€” Reasoning LLM

Set your API key and optionally configure the provider:

# Option A โ€” OpenAI Moderation (free with any OpenAI key):
export OPENAI_API_KEY=sk-...

# Option B โ€” LlamaGuard via Groq (free tier, very fast):
export GROQ_API_KEY=gsk_...

In humane_proxy.yaml:

pipeline:
  enabled_stages: [1, 2, 3]

stage3:
  # "auto"               โ†’ detects OPENAI_API_KEY first, then GROQ_API_KEY
  # "openai_moderation"  โ†’ OpenAI /v1/moderations (free, fast)
  # "llamaguard"         โ†’ LlamaGuard-3-8B via Groq/Together
  # "openai_chat"        โ†’ Any OpenAI-compatible chat model
  # "none"               โ†’ Disable Stage 3
  provider: "auto"
  timeout: 10   # seconds

  openai_moderation:
    api_url: "https://api.openai.com/v1/moderations"

  llamaguard:
    api_url: "https://api.groq.com/openai/v1/chat/completions"
    model: "meta-llama/llama-guard-3-8b"

  openai_chat:
    api_url: "https://api.openai.com/v1/chat/completions"
    model: "gpt-4o-mini"

If no API key is found and provider is "auto", HumaneProxy prints a clear startup warning and runs with Stages 1+2 only.


Self-Harm Care Response

When self-harm is detected, HumaneProxy can respond in two ways:

Mode B โ€” Block (default)

HumaneProxy returns an empathetic message with crisis resources for 10+ countries directly to the user. Your LLM is never involved.

safety:
  categories:
    self_harm:
      # Self-harm escalation threshold (0.0 to 1.0).
      # Scores below this are downgraded to safe.
      escalate_threshold: 0.5

      response_mode: "block"     # default

      # Optional: override the built-in message
      block_message: "We're here for you. Please reach out to..."

Built-in crisis resources include: ๐Ÿ‡บ๐Ÿ‡ธ US (988) ยท ๐Ÿ‡ฎ๐Ÿ‡ณ India (iCall, Vandrevala) ยท ๐Ÿ‡ฌ๐Ÿ‡ง UK (Samaritans) ยท ๐Ÿ‡ฆ๐Ÿ‡บ AU (Lifeline) ยท ๐Ÿ‡จ๐Ÿ‡ฆ CA ยท ๐Ÿ‡ฉ๐Ÿ‡ช DE ยท ๐Ÿ‡ซ๐Ÿ‡ท FR ยท ๐Ÿ‡ง๐Ÿ‡ท BR ยท ๐Ÿ‡ฟ๐Ÿ‡ฆ ZA ยท ๐ŸŒ IASP + Befrienders

Mode A โ€” Forward with care context

Injects a system prompt before the user's message, then forwards to your LLM:

safety:
  categories:
    self_harm:
      response_mode: "forward"

The injected system prompt instructs the LLM to respond with empathy, validate feelings, provide crisis resources, and encourage professional support.


Risk Trajectory & Time-Decay

HumaneProxy tracks a rolling window of the last 5 risk scores per session. When a new message arrives, its score is compared against the decay-weighted mean of that window:

delta = current_score โˆ’ weighted_mean(last N scores)
spike = delta > 0.35    (configurable via spike_delta)

If a spike is detected, a boost penalty (+0.25) is added to the current score to push it closer to escalation.

Exponential Time-Decay

Historical scores are weighted using the formula:

$$w_i = e^{-\lambda , \Delta t_i}$$

where ฮป = ln(2) / half-life and ฮ”t is the age of each score in seconds. This means:

Time elapsed

Weight (24 h half-life)

Meaning

5 minutes

99.8 %

Near-full weight โ€” live conversation

6 hours

84 %

Still highly relevant

24 hours

50 %

Half weight โ€” yesterday's scores

48 hours

25 %

Faded โ€” two days ago

72 hours

12.5 %

Nearly forgotten

Why this matters: Without decay, a user who had a tough conversation on Monday would carry that elevated baseline into Thursdayโ€”unfairly triggering spikes on innocuous messages. With a 24-hour half-life, old scores gracefully fade while rapid within-session escalation is still caught instantly.

Configuration

trajectory:
  window_size: 5          # messages in rolling window
  spike_delta: 0.35       # delta threshold for spike detection

  # Half-life in hours.  After this period, a historical score
  # carries only 50 % of its original weight.
  #   24  โ†’ balanced forgiveness + familiarity (default)
  #   6   โ†’ aggressive decay, only very recent history matters
  #   72  โ†’ gentle decay, multi-day memory
  #   0   โ†’ disable decay (plain unweighted mean)
  decay_half_life_hours: 24.0

Or via environment variable:

export HUMANE_PROXY_DECAY_HALF_LIFE=12   # 12-hour half-life

Alert Webhooks

Configure in humane_proxy.yaml:

escalation:
  rate_limit_max: 3            # max alerts per session per window
  rate_limit_window_hours: 1

  webhooks:
    slack_url: "https://hooks.slack.com/services/..."
    discord_url: "https://discord.com/api/webhooks/..."
    pagerduty_routing_key: "your-routing-key"
    teams_url: "https://outlook.office.com/webhook/..."

    # Email alerts via SMTP (stdlib, no extra deps)
    email:
      host: "smtp.gmail.com"
      port: 587
      use_tls: true
      username: "your@gmail.com"
      password: "app-password"
      from: "humane-proxy@yourorg.com"
      to:
        - "safety-team@yourorg.com"
        - "oncall@yourorg.com"

# Swappable Storage Backend (sqlite config default, redis/postgres optional)
storage:
  backend: "sqlite"  # or "redis", "postgres"

CLI Reference

# Safety check
humane-proxy check "I want to end my life"
# ๐Ÿ†˜ FLAGGED โ€” self_harm
# Score   : 1.0
# Category: self_harm

# List recent escalations
humane-proxy escalations
humane-proxy escalations --category self_harm --limit 50

# Session risk history
humane-proxy session user-42

# Start proxy server
humane-proxy start [--host 0.0.0.0] [--port 8000]

# MCP server (requires [mcp] extra)
humane-proxy mcp-serve

REST Admin API

Mounted at /admin, secured with HUMANE_PROXY_ADMIN_KEY Bearer token:

export HUMANE_PROXY_ADMIN_KEY=your-secret-key

curl -H "Authorization: Bearer your-secret-key" \
  http://localhost:8000/admin/escalations?category=self_harm&limit=10

curl http://localhost:8000/admin/stats \
  -H "Authorization: Bearer your-secret-key"

# Delete session data (right to erasure)
curl -X DELETE http://localhost:8000/admin/sessions/user-42 \
  -H "Authorization: Bearer your-secret-key"

Endpoint

Description

GET /admin/health

Health check (no auth required)

GET /admin/config

Active config view (secrets redacted)

GET /admin/escalations

Paginated list, filterable by category, session_id, date, sortable

GET /admin/escalations/export

CSV export of escalations

GET /admin/escalations/{id}

Single escalation detail

GET /admin/sessions/{id}/risk

Session history + trajectory

GET /admin/stats

Aggregate counts, top sessions, hourly breakdown

DELETE /admin/sessions/{id}

Delete all session records


MCP Server (for AI Agents)

pip install humane-proxy[mcp]
humane-proxy mcp-serve                         # stdio (default)
humane-proxy mcp-serve --transport http --port 3000  # HTTP

Exposes three tools via Model Context Protocol:

Tool

Description

check_message_safety

Full pipeline classification

get_session_risk

Session trajectory (trend, spike, category counts)

list_recent_escalations

Audit log query

Available on the Official MCP Registry.


AI Agent Integrations

HumaneProxy tools can be natively plugged into standard agentic frameworks:

LlamaIndex

pip install humane-proxy[llamaindex]
from humane_proxy.integrations.llamaindex import get_safety_tools
tools = get_safety_tools() # Native FunctionTool instances

CrewAI

pip install humane-proxy[crewai]
from humane_proxy.integrations.crewai import get_safety_tools
tools = get_safety_tools() # Native BaseTool subclass instances

AutoGen (AG2)

pip install humane-proxy[autogen]
from humane_proxy.integrations.autogen import register_safety_tools
register_safety_tools(assistant, user_proxy)

LangChain

pip install humane-proxy[langchain]
from humane_proxy.integrations.langchain import get_safety_tools

# Returns LangChain-compatible tools via MCP
tools = await get_safety_tools()
# โ†’ [check_message_safety, get_session_risk, list_recent_escalations]

# Or get the config dict for MultiServerMCPClient:
from humane_proxy.integrations.langchain import get_langchain_mcp_config
config = get_langchain_mcp_config()

Configuration Reference

All values can be set in humane_proxy.yaml (project root) or via HUMANE_PROXY_* environment variables. Environment variables always win.

YAML key

Env var

Default

Description

safety.risk_threshold

HUMANE_PROXY_RISK_THRESHOLD

0.7

Score threshold for criminal_intent escalation

safety.categories.self_harm.escalate_threshold

HUMANE_PROXY_SELF_HARM_THRESHOLD

0.5

Score threshold for self_harm escalation

safety.spike_boost

HUMANE_PROXY_SPIKE_BOOST

0.25

Score boost on trajectory spike

server.port

HUMANE_PROXY_PORT

8000

Proxy port

pipeline.enabled_stages

HUMANE_PROXY_ENABLED_STAGES

[1]

Active stages (e.g. 1,2,3)

pipeline.stage1_ceiling

HUMANE_PROXY_STAGE1_CEILING

0.3

Early exit after Stage 1

pipeline.stage2_ceiling

HUMANE_PROXY_STAGE2_CEILING

0.4

Early exit after Stage 2

stage3.provider

HUMANE_PROXY_STAGE3_PROVIDER

"auto"

Stage 3 provider

stage3.timeout

HUMANE_PROXY_STAGE3_TIMEOUT

10

Stage 3 timeout (s)

privacy.store_message_text

โ€”

false

Store raw text (vs SHA-256 hash)

escalation.rate_limit_max

HUMANE_PROXY_RATE_LIMIT_MAX

3

Max alerts per session/window

storage.backend

HUMANE_PROXY_STORAGE_BACKEND

"sqlite"

"sqlite", "redis", "postgres"

safety.categories.self_harm.response_mode

โ€”

"block"

"block" or "forward"


Privacy

By default HumaneProxy never stores raw message text. Only a SHA-256 hash is persisted for correlation. The escalation DB stores:

  • session_id โ€” your identifier

  • category โ€” self_harm or criminal_intent

  • risk_score โ€” 0.0โ€“1.0

  • triggers โ€” which patterns fired

  • message_hash โ€” SHA-256 of the original text

  • stage_reached โ€” which pipeline stage produced the result

  • reasoning โ€” Stage-3 LLM reasoning (if available)

To enable raw text storage (e.g. for human review):

privacy:
  store_message_text: true

Installation Extras

Extra

Command

What it adds

(none)

pip install humane-proxy

Stage 1 heuristics + default SQLite storage

ml

pip install humane-proxy[ml]

Stage 2 semantic embeddings (sentence-transformers)

mcp

pip install humane-proxy[mcp]

MCP server for AI agent integration (fastmcp)

redis

pip install humane-proxy[redis]

Redis storage backend (redis)

postgres

pip install humane-proxy[postgres]

PostgreSQL storage backend (psycopg, psycopg_pool)

llamaindex

pip install humane-proxy[llamaindex]

LlamaIndex native integration (llama-index-core)

crewai

pip install humane-proxy[crewai]

CrewAI native integration (crewai[tools])

autogen

pip install humane-proxy[autogen]

AutoGen native integration (autogen-agentchat)

langchain

pip install humane-proxy[langchain]

LangChain adapter (MCP + langchain-mcp-adapters)

all

pip install humane-proxy[all]

Includes ALL optional dependencies above


License

Apache 2.0. See LICENSE.

Copyright 2026 Vishisht Mishra (@Vishisht16). Any attribution is appreciated.

See NOTICE for full attribution information.


Built for a safer world.

Install Server
A
security โ€“ no known vulnerabilities
A
license - permissive license
A
quality - A tier

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Vishisht16/Humane-Proxy'

If you have feedback or need assistance with the MCP directory API, please join our Discord server