Skip to main content
Glama

Nightman

The Nightman comes for your untested code.

Point it at a Python function. It throws adversarial inputs until something breaks, shrinks the failure to its smallest form, and hands you the pytest regression test that proves it. A bug-finder, not a test-writer β€” it only writes a test once it already has a crash in hand.

Ships as a CLI and an MCP server. Second in a gang of It's Always Sunny-flavored dev tools, after Charlie Work.


πŸŒ™ THE NIGHTMAN COMETH. Sneaky and mean. A master of karate β€” and of the empty list you forgot to handle.

The jokes are a toggle, not a tax β€” pass --plain (or set NIGHTMAN_VOICE=off) and every line comes back flavor-free, paste-into-a-ticket clean. CI and machine output are always plain.

Why not just "AI writes my tests"?

Because auto-generated tests are distrusted for good reasons β€” they pin whatever the code already does, they flake, they assert nothing that matters. Nightman is the opposite by construction:

  • It leads with a real failure. Nothing is written until a generated input actually crashes the function or violates a stated property. No crash, no test.

  • What it commits can't flake. The deliverable is a frozen, minimized pytest.param(...) case β€” one pinned input, deterministic, no live fuzzer in your CI.

  • You write zero properties. Nightman infers them from type hints, signatures, and docstrings β€” killing the #1 reason people bounce off property-based testing.

Related MCP server: mcp-debugpy

Quickstart

Run it straight from the repo with uv β€” no clone, no build:

uvx --from git+https://github.com/Falcon305/nightman nightman hunt yourmodule:your_function

Or point it at a file directly:

uvx --from git+https://github.com/Falcon305/nightman nightman hunt path/to/parsing.py:parse

What it looks like

$ nightman hunt binsearch.py:search

πŸŒ™ THE NIGHTMAN COMETH.
   He came for search() with:  search(arr=[], target=0)
   β†’ IndexError: list index out of range at binsearch.py:5
   found on try #1.
   shrunk to its smallest form in 310 more.

nightman harden binsearch.py:search --write does the same, then drops a committable regression test:

# Regression test written by Nightman (github.com/Falcon305/nightman).
# The Nightman came for search() and it broke:
#   search(arr=[], target=0)
#   -> IndexError: list index out of range
# The minimized failing input is pinned below. Delete this test once the bug is dead.
import pytest

from binsearch import search


@pytest.mark.parametrize("kwargs", [pytest.param({'arr': [], 'target': 0}, id='nightman-3821a3')])
def test_search_nightman(kwargs):
    search(**kwargs)

That test fails on the buggy code and passes once you fix it β€” a real regression net, not a snapshot of today's behavior.

How it works (delegate, don't reinvent)

Nightman's value is the orchestration, the sandbox, and the committable artifact β€” not a new fuzzer. Under the hood it stands on the best open engines:

  • Generation + shrinking β†’ Hypothesis. Its shrinking engine is world-class; Nightman drives it and captures the minimized counterexample.

  • Strategy inference from type hints (from_type/builds), with a fallback ladder for untyped code: docstring types β†’ default values β†’ parameter-name heuristics β†’ a hand-built chaos corpus (empty collections, NaN/Β±inf, boundary ints, surrogate/\x00/huge strings).

  • Properties, strongest-first: a never-crashes floor, plus roundtrip (decode(encode(x)) == x, detected by name pairs), idempotence, and a differential oracle for comparing a suspect against a reference.

  • A sandboxed executor β€” each hunt runs in a spawned subprocess with CPU/memory limits, so a memory bomb or an infinite loop is capped, and a native segfault survives as a reported result instead of taking down the run.

Proving it works β€” the eval

Novelty isn't the moat (Anthropic and AWS have both shown agents can do this); a trustworthy, packaged tool is. So Nightman ships a reproducible, seeded eval over a corpus of planted bugs β€” off-by-one, empty-input, unicode, unsafe eval, integer truncation, runaway recursion, and more. For each it measures detection, repro minimality, and time-to-first-failure β€” and critically, runs the same hunt against the fixed code and requires zero false positives (the first thing a skeptic checks).

$ python evals/run.py
Nightman eval β€” 10 planted bugs, 4 seeds each

category             oracle        detect  trials  min|canon  ttff   fp
-----------------------------------------------------------------------
boundary_index       crash         yes     4/4     0|0        1      -
comparison_flip      differential  yes     4/4     3|2        7      -
empty_input          crash         yes     4/4     0|0        1      -
integer_truncation   differential  yes     4/4     4|2        3      -
off_by_one           crash         yes     4/4     1|1        1      -
off_by_one           differential  yes     4/4     2|2        1      -
recursion            crash         yes     4/4     6|1        2      -
type_coercion        crash         yes     4/4     1|1        1      -
unicode_edge         crash         yes     4/4     1|1        2      -
unsafe_eval          crash         yes     4/4     0|0        1      -

detection_rate      : 100%  (10/10)
false_positive_rate : 0%   (must be 0%)
median minimal input: 1.0
median TTFF (execs) : 1.0
RESULT: PASS

Every planted bug found, every one shrunk to a near-minimal input, and not one false alarm on the fixed code.

For your coding agent (MCP)

Nightman is also an MCP server, so an agent can hunt bugs and write regression tests itself. Point Claude Desktop / Cursor / Claude Code at it:

{
  "mcpServers": {
    "nightman": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/Falcon305/nightman", "nightman", "serve"]
    }
  }
}

It exposes nightman_infer_inputs, nightman_hunt, nightman_harden, and nightman_write_regression_test (structured output), plus a harden_function prompt that scripts the whole loop. Then: "Harden parse in parser.py β€” find how it breaks, fix it, and leave a regression test."

Development

uv sync --extra dev
uv run ruff check . && uv run mypy && uv run pytest -q
uv run python evals/run.py

CI runs ruff + mypy (typed) + pytest + the eval across Python 3.11–3.13. Releases publish to PyPI via OIDC Trusted Publishing.

The gang

Each ships as its own standalone tool: Charlie Work (the toil nobody wants to do) Β· Nightman (the input your code wasn't ready for) Β· more coming.

License

Code: MIT.

The hero image is a still from It's Always Sunny in Philadelphia's "The Nightman Cometh" (S4E13, Β© FX Networks), used here for identification and commentary. It is not covered by the MIT license and remains the property of its rights holder. An original vector rendition ships at assets/hero.svg.

A
license - permissive license
-
quality - not tested
B
maintenance

Maintenance

–Maintainers
–Response time
–Release cycle
–Releases (12mo)
Commit activity

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Falcon305/nightman'

If you have feedback or need assistance with the MCP directory API, please join our Discord server