Run a safety litmus on a Claude Code skill
run_skill_litmusGrade a skill's static safety against open litmus, detecting prompt injection, data exfiltration, and dangerous commands without execution.
Instructions
Grade a Claude Code / Agent Skill A/B/D/F against the open static safety litmus (litmus-skill-v1). A skill is a SKILL.md (instructions + frontmatter) plus an optional bundle. The litmus scans the bytes for S-01 prompt-injection / context-poisoning in the body, S-03 data-exfiltration instructions, and S-04 dangerous commands in bundled executable scripts. It content-hashes the whole directory (the anti-tamper anchor).
The SAFETY letter is a STATIC read: it does NOT execute the skill or its scripts and is fast — therefore NOT behavioral proof. An A means the static checks found no injection, exfil instruction, or dangerous bundled command, not that the skill is safe to run unsupervised. A command a skill constructs or fetches at runtime is not visible to static scanning (a disclosed limit).
It also returns a SEPARATE, advisory quality signal (well-formed / issues /
malformed) — never an A–F letter, never minted, never affecting the safety letter.
Its deterministic checks always run; its optional LLM-judged axes (honesty,
coherence) run only when a judge is available — the host agent's own model via MCP
sampling (no key), or a user-provided OpenAI-compatible key — and are skipped
otherwise.
skill_ref (v1): a LOCAL path to a skill directory containing SKILL.md, e.g. ./skills/my-skill. Remote refs (github//#path, marketplace//) are not yet supported.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| skill_ref | Yes | Local path to a skill directory (must contain SKILL.md). Remote refs are not yet supported in this version. |