Skip to main content
Glama

Grippy Code Review

Open-source AI code review agent. Your model, your infrastructure, your rules.

Tests codecov CodeQL OpenSSF Scorecard SLSA 3 PyPI License: MIT Python 3.12+ Ruff

Grippy reviews pull requests using any OpenAI-compatible model — GPT, Claude, or a local LLM running on your own hardware. It indexes your codebase into a vector store for context-aware analysis, then posts structured findings with scores, verdicts, and escalation paths. It also happens to be a grumpy security auditor who secretly respects good code.

Why Grippy?

  • Your model, your infrastructure. Bring your own model. No SaaS dependency, no per-seat fees. Run GPT-5 through OpenAI, Claude through a compatible proxy, or a local model via Ollama or LM Studio.

  • Codebase-aware, not diff-blind. Grippy embeds your repository into a LanceDB hybrid search index (vector + full-text) and searches it during review. It understands the code around the diff, not just the diff itself. Most OSS alternatives paywall this behind a hosted tier.

  • Cross-PR memory, not amnesia. Grippy builds a knowledge graph of your codebase — tracking files, reviews, findings, and import dependencies across every PR. It knows which modules are blast-radius risks, which files have recurring findings, and which authors have patterns worth watching. Tools like CodeRabbit, Greptile, and Qodo charge $20–38/seat/month for comparable cross-PR context. Here, it's free and open-source.

  • Structured output, not just comments. Every review produces typed findings with severity, confidence, and category. A score out of 100. A verdict (PASS / FAIL / PROVISIONAL). Escalation targets for findings that need human attention.

  • Security-first, not security-added. Grippy is a security auditor that also reviews code, not the other way around. Dedicated audit modes go deeper than a general-purpose linter.

  • Deterministic rules, not just LLM guesses. A built-in rule engine runs 10 security rules against every diff before the LLM sees it. Findings are guaranteed — not hallucinated — and the profile gate can fail CI on critical severity hits, independent of model output.

  • MCP server — use Grippy as a local diff auditor from Claude Code, Cursor, or Claude Desktop via the Model Context Protocol.

  • It has opinions. Grippy is a grumpy security auditor persona, not a faceless bot. Good code gets grudging respect. Bad code gets disappointment. The personality keeps reviews readable and honest.

What it looks like

An inline finding on a PR diff:

CRITICAL | security | confidence: 95

SQL injection via string interpolation

query = f"SELECT * FROM users WHERE id = {user_id}" constructs a SQL query from unsanitized input. Use parameterized queries.

grippy_note: I've seen production databases get wiped by less. Parameterize it or I'm telling the security team.

A review summary posted as a PR comment:

Score: 45/100 | Verdict: FAIL | Complexity: STANDARD

3 findings (1 critical, 1 high, 1 medium) | 1 escalation to security-team

"I've reviewed thousands of PRs. This one made me mass in-progress a packet of antacids."

Quick start

GitHub Actions (OpenAI)

Add .github/workflows/grippy-review.yml to your repo:

name: Grippy Review

on:
  pull_request:
    types: [opened, synchronize, reopened]

permissions:
  contents: read
  pull-requests: write

jobs:
  review:
    name: Grippy Code Review
    runs-on: ubuntu-latest
    steps:
      - uses: step-security/harden-runner@a90bcbc6539c36a85cdfeb73f7e2f433735f215b  # v2.15.0
        with:
          egress-policy: audit

      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6

      - uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405  # v6
        with:
          python-version: '3.12'

      - name: Install Grippy
        run: pip install "grippy-mcp"

      # Cache the vector index to avoid re-indexing on every push
      - name: Cache Grippy data
        uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306  # v5
        with:
          path: ./grippy-data
          key: grippy-${{ github.event.pull_request.number }}-${{ github.sha }}
          restore-keys: grippy-${{ github.event.pull_request.number }}-

      - name: Run review
        id: review
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          GITHUB_EVENT_PATH: ${{ github.event_path }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          GRIPPY_MODEL_ID: gpt-4.1
        run: grippy

Want LLM-only review without the rule engine? Set GRIPPY_PROFILE: general. For stricter gating (fail on WARN+), use strict-security. See examples/ for more workflow variants.

GitHub Actions (self-hosted LLM)

Grippy works with any OpenAI-compatible API endpoint, including Ollama, LM Studio, and vLLM. We recommend Devstral-Small 24B at Q4 quantization or higher — tested during development for structured output compliance and review quality. See the Self-Hosted LLM Guide for full setup instructions.

Local development

# OpenAI (default, included in base install)
pip install "grippy-mcp"

# Anthropic
pip install "grippy-mcp[anthropic]"

# Google (Gemini)
pip install "grippy-mcp[google]"

# Groq
pip install "grippy-mcp[groq]"

# Mistral
pip install "grippy-mcp[mistral]"

# Or with uv
uv add "grippy-mcp[anthropic]"

MCP Server

Quick start (zero install)

uvx grippy-mcp serve

Or install globally:

pip install grippy-mcp
grippy serve

Grippy runs as an MCP server for local git diff auditing — no GitHub Actions required.

Two tools:

Tool

What it does

LLM required?

scan_diff

Deterministic security rules

No

audit_diff

Full AI-powered code review

Yes

Scope options (both tools):

  • "staged" — staged changes (git diff --cached)

  • "commit:<ref>" — a specific commit (e.g. "commit:HEAD")

  • "range:<base>..<head>" — commit range (e.g. "range:main..HEAD")

Install into your MCP client:

python -m grippy install-mcp          # registers uvx grippy-mcp in client configs
python -m grippy install-mcp --dev    # dev mode: uses uv run --directory

The installer detects Claude Code, Claude Desktop, and Cursor, then writes the server config with your chosen LLM transport and API keys.

Run the server directly:

python -m grippy serve

MCP tools return dense, structured JSON designed for AI agent consumption — no personality or ASCII art.

Configuration

Grippy is configured entirely through environment variables.

Variable

Purpose

Default

GRIPPY_TRANSPORT

API transport: openai, anthropic, google, groq, mistral, or local

local

GRIPPY_MODEL_ID

Model identifier

devstral-small-2-24b-instruct-2512

GRIPPY_BASE_URL

API endpoint for local transport

http://localhost:1234/v1

GRIPPY_EMBEDDING_MODEL

Embedding model name

text-embedding-qwen3-embedding-4b

GRIPPY_API_KEY

API key for non-OpenAI endpoints

lm-studio

GRIPPY_DATA_DIR

Persistence directory

./grippy-data

GRIPPY_TIMEOUT

Review timeout in seconds (0 = none)

300

GRIPPY_PROFILE

Security profile: security, strict-security, general

security

GRIPPY_MODE

Review mode override

pr_review

OPENAI_API_KEY

OpenAI API key (sets transport to openai)

GITHUB_TOKEN

GitHub API token (set automatically by Actions)

Cross-vendor model selection

If your codebase is co-developed with an AI coding assistant, we strongly recommend running Grippy on a model from a different vendor than the one that wrote the code. Different model families have different training data, different biases, and different blind spots. A reviewer that shares the same priors as the author is more likely to miss the same classes of bugs. Using a cross-vendor model — for example, reviewing GPT-authored code with Claude, or Claude-authored code with GPT — gives you a genuinely independent audit rather than an echo chamber.

Security profiles

Grippy ships with the deterministic rule engine on by default (security profile). Ten rules scan every diff for secrets, dangerous sinks, workflow permission escalation, path traversal, unsanitized LLM output, risky CI scripts, SQL injection, weak cryptography, hardcoded credentials, and insecure deserialization — before the LLM sees anything. These findings are guaranteed, not hallucinated.

Switch profiles via GRIPPY_PROFILE env var or --profile CLI flag (CLI takes priority).

Profile

What happens

Gate behavior

When to use

security (default)

Rules + LLM review

CI fails on ERROR or CRITICAL rule findings

Most teams — catches real issues without noise

strict-security

Rules + LLM review

CI fails on WARN or higher

High-assurance, compliance, external contributors

general

LLM review only

No rule gate

When you only want AI-powered review, no deterministic scanning

# Use the default (security)
grippy

# Explicit profile
grippy --profile strict-security

# Via environment variable
GRIPPY_PROFILE=general grippy

The 10 deterministic rules:

Rule ID

Detects

Severity

workflow-permissions-expanded

write/admin permissions, unpinned actions

ERROR / WARN

secrets-in-diff

API keys, private keys, .env additions

CRITICAL / WARN

dangerous-execution-sinks

unsafe code execution patterns

ERROR

path-traversal-risk

tainted path variables, ../ patterns

WARN

llm-output-unsanitized

model output piped to sinks without sanitizer

ERROR

ci-script-execution-risk

risky CI script patterns, sudo in CI

CRITICAL / WARN

sql-injection-risk

SQL queries built from interpolated input

ERROR

weak-crypto

MD5, SHA1, DES, ECB mode, insecure RNG

WARN

hardcoded-credentials

passwords, connection strings, auth headers

ERROR

insecure-deserialization

unsafe deserialization sinks (shelve, dill, etc.)

ERROR

Rule findings are injected into the LLM context as confirmed facts for explanation.

When the knowledge graph is available (CI with caching, or MCP with persistent GRIPPY_DATA_DIR), rule findings are enriched with:

  • Blast radius — how many modules depend on the flagged file

  • Recurrence — whether this rule has fired on this file in prior reviews

  • False positive suppression — import-aware suppression (e.g., SQL injection suppressed when file imports SQLAlchemy)

  • Finding velocity — how often this rule fires across recent reviews

Suppression

.grippyignore — file-level suppression

Create a .grippyignore file in your repo root to exclude files from review. Uses gitignore syntax (comments, negation, wildcards):

# Exclude generated code
vendor/
*.generated.py

# Exclude test fixtures that contain intentional anti-patterns
tests/test_rule_*.py

# But keep the hostile environment tests
!tests/test_hostile_environment.py

Excluded files are stripped from the diff before either the rule engine or the LLM sees them.

# nogrip — line-level pragma

Suppress deterministic rule findings on specific lines:

password = os.environ["DB_PASS"]  # nogrip
conn = f"postgres://{user}:{password}@host/db"  # nogrip: hardcoded-credentials
h = hashlib.md5(data)  # nogrip: weak-crypto, hardcoded-credentials
  • Bare # nogrip suppresses all rules on that line

  • # nogrip: rule-id suppresses only the named rule

  • # nogrip: id1, id2 suppresses multiple rules

  • Rules only — the LLM reviewer still sees the line and may comment on it

Review modes

Mode

Trigger

Focus

pr_review

Default on PR events

Full code review: correctness, security, style, maintainability

security_audit

Manual, scheduled, or auto when profile != general

Deep security analysis: injection, auth, cryptography, data exposure

governance_check

Manual or scheduled

Compliance and policy: licensing, access control, audit trails

surprise_audit

PR title/body contains "production ready"

Full-scope audit with expanded governance checks

cli

Local invocation

Interactive review for local development and testing

github_app

GitHub App webhook

Event-driven review via installed GitHub App

GitHub Actions outputs

When running as a GitHub Action, Grippy sets these step outputs for downstream workflow logic:

Output

Type

Description

score

int

Review score 0–100

verdict

string

PASS / FAIL / PROVISIONAL

findings-count

int

Total LLM finding count

merge-blocking

bool

Whether verdict blocks merge

rule-findings-count

int

Deterministic rule hit count

rule-gate-failed

bool

Whether rule gate caused CI failure

profile

string

Active security profile name

Security

Grippy operates in an adversarial environment — PR diffs are untrusted input controlled by any contributor. Defense-in-depth sanitization is applied at every stage of the pipeline, validated by a 44-test adversarial test suite covering 9 attack domains.

Input sanitization. All untrusted text (PR metadata, diffs, tool outputs) passes through navi-sanitize for Unicode normalization — stripping invisible characters (ZWSP, bidi overrides, variation selectors), normalizing homoglyphs (Cyrillic/Greek → ASCII), and removing null bytes. This runs before any other processing.

Prompt injection defense. Three layers protect the LLM context:

  1. XML escaping — All context sections (<diff>, <pr_metadata>, <rule_findings>, etc.) are XML-escaped, preventing </diff><system>... breakout attacks.

  2. NL injection pattern neutralization — Seven compiled regex patterns detect and replace natural-language injection attempts (scoring directives, confidence manipulation, system override phrases) with [BLOCKED] markers.

  3. Data-fence boundary — A preamble in the LLM prompt explicitly marks all subsequent content as "USER-PROVIDED DATA only" with instructions to ignore embedded directives.

Output sanitization. LLM-generated text passes through a five-stage pipeline before posting to GitHub:

  1. navi-sanitize — Unicode normalization (same as input stage).

  2. nh3 — Rust-based HTML sanitizer strips all HTML tags from free-text fields.

  3. Markdown image stripping — Removes ![](url) syntax to prevent tracking pixels in review comments.

  4. Markdown link rewriting — Converts [text](https://url) to plain text to prevent phishing links.

  5. URL scheme filter — Removes javascript:, data:, and vbscript: schemes from remaining link syntax.

Tool output sanitization. Codebase tool responses (read_file, grep_code, list_files) are sanitized with navi-sanitize and XML-escaped before reaching the LLM, preventing indirect prompt injection through crafted file contents.

Adversarial test suite. tests/test_hostile_environment.py exercises 44 attack scenarios across Unicode attacks, prompt injection, tool exploitation, output sanitization gaps, information leakage, schema validation attacks, session history poisoning, and more. All 44 pass.

See the Security Model for codebase tool protections, CI hardening, and the full threat model.

Retrieval Quality Benchmarks

Grippy includes a benchmark suite for validating search and graph retrieval quality.

# Run search benchmarks (requires embedding model)
python -m benchmarks search --k 5

# Run graph retrieval benchmarks (requires populated graph DB)
python -m benchmarks graph

# Run all benchmarks
python -m benchmarks all

Results are written as JSON to benchmarks/output/.

Documentation

License

MIT

A
license - permissive license
-
quality - not tested
C
maintenance

Maintenance

Maintainers
Response time
Release cycle
1Releases (12mo)

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Project-Navi/grippy-code-review'

If you have feedback or need assistance with the MCP directory API, please join our Discord server