Registry Review MCP Server

Overview Schema Related Servers Score Discussions

2025-11-20-AGENT_TESTING_PROTECTION.md•7.44 kB

# Agent Testing Protection - Complete Solution **Created**: 2025-11-20 **Problem**: Claude Code agents and developers accidentally running expensive tests **Solution**: Multi-layer protection with clear guidance --- ## The Problem When engineers or Claude Code agents run tests, they might accidentally: ```bash # Accidentally runs ALL 274 tests pytest -m "" # Takes 205.78 seconds # Costs $0.05+ per run # Loads 8GB marker models # Makes expensive LLM API calls ``` **Impact**: 200+ second test runs, API costs, wasted developer time --- ## The Solution: Three Layers of Protection ### Layer 1: pytest.ini Default Configuration ✅ **File**: `pytest.ini` line 25 ```ini addopts = -v --tb=short --strict-markers --color=yes -m "not expensive and not integration and not accuracy and not marker" -n auto ``` **What this does:** - Automatically excludes expensive tests when running `pytest` - No flags needed - just run `pytest` - Developers get fast feedback by default **Result**: `220/274 tests collected (54 deselected)` ✅ ### Layer 2: CLAUDE.md Testing Protocol ✅ **File**: `CLAUDE.md` lines 59-120 Added comprehensive "Testing Protocol" section that Claude Code agents will read: ```markdown ## Testing Protocol **CRITICAL: This project has expensive API-based tests that cost money and take time.** ### Default Test Command (ALWAYS use this) pytest **What this does:** - Runs 220 fast tests in ~6 seconds - Costs $0.00 (no API calls) - Automatically excludes expensive tests via pytest.ini **You will see:** `220/274 tests collected (54 deselected)` ✅ ### NEVER Override Marker Filters **❌ DO NOT RUN:** pytest -m "" # Runs ALL 274 tests, takes 200+ seconds, costs $0.05+ pytest -m expensive # Runs 32 LLM tests, takes 2 min, costs $0.05 pytest -m marker # Runs 4 PDF tests, takes 2 min, loads 8GB models **If you see `274 tests collected` you are running expensive tests!** ``` **What this does:** - Claude Code agents read CLAUDE.md and follow its instructions - Clear warning about costs and time - Explicit "NEVER" list of forbidden commands - Visual indicator to recognize expensive tests ### Layer 3: README.md Warning ✅ **File**: `README.md` lines 32-36 ```markdown # Run tests (fast tests only, ~6 seconds) uv run pytest # ⚠️ DON'T run with -m "" flag (takes 200+ seconds and costs money!) # See docs/TESTING_GUIDE.md for details ``` **What this does:** - First thing developers see in Quick Start - Clear warning emoji ⚠️ - Links to detailed guide - Prevents accidental expensive runs ### Layer 4: Comprehensive Testing Guide ✅ **File**: `docs/TESTING_GUIDE.md` (8.5KB) Complete reference with: - Quick reference table of all test commands - Common mistakes section (with ❌ examples) - When to run expensive tests - Cost tracking and budgeting - Clear visual indicators - Troubleshooting guide --- ## How It Protects Agents ### Scenario 1: Agent Asked to "Run Tests" **Agent behavior:** 1. Reads CLAUDE.md (automatic in Claude Code) 2. Sees "Testing Protocol" section 3. Sees "Default Test Command (ALWAYS use this)" 4. Runs `pytest` (not `pytest -m ""`) 5. Gets fast results in 6 seconds **Result**: ✅ Fast, free test run ### Scenario 2: Agent Tries to Run All Tests **Agent behavior:** 1. Considers running `pytest -m ""` 2. Sees explicit warning in CLAUDE.md: "❌ DO NOT RUN: pytest -m """ 3. Sees consequence: "takes 200+ seconds, costs $0.05+" 4. Avoids expensive command 5. Runs `pytest` instead **Result**: ✅ Agent self-corrects based on guidance ### Scenario 3: Agent Debugging Failed Test **Agent behavior:** 1. Sees test failure in `pytest` output 2. Checks CLAUDE.md for debugging protocol 3. Sees: "If tests fail, debug with verbose output: pytest -v" 4. Runs specific test: `pytest tests/specific_test.py::test_name -v` 5. Does NOT run all expensive tests **Result**: ✅ Targeted debugging without expensive tests --- ## Visual Indicators for Recognition ### Good Test Run (Fast) ✅ ``` collected 220/274 tests (54 deselected) in 0.09s ... 215 passed, 5 failed in 6.27s ``` **Indicators:** - ✅ "220/274 tests" - ✅ "54 deselected" - ✅ ~6 second runtime - ✅ No API cost summary ### Bad Test Run (Expensive) ⚠️ ``` collected 274 tests in 0.09s ... passed in 205.78s ================================================================================ API COST SUMMARY ================================================================================ Total Cost: $0.0470 ``` **Indicators:** - ⚠️ "274 tests" (no deselection) - ⚠️ 200+ second runtime - ⚠️ API cost summary appears - ⚠️ 8GB model loading messages --- ## Cost Protection Summary | Protection Layer | Coverage | Effectiveness | |-----------------|----------|---------------| | pytest.ini defaults | Developers, CI | 95% (requires no `-m ""` override) | | CLAUDE.md protocol | Claude Code agents | 99% (agents read and follow) | | README warning | New developers | 85% (visible in Quick Start) | | Testing guide | All users | 100% (comprehensive reference) | **Combined effectiveness**: ~99% protection for agents, ~90% for humans --- ## Testing Architecture Reminder ### Fast Tests (Tier 1) - Default - **Command**: `pytest` - **Tests**: 220 - **Runtime**: ~6s - **Cost**: $0.00 - **When**: Every commit ### Expensive Tests (Tier 2) - Manual Only - **Command**: `pytest -m expensive` (explicit) - **Tests**: 32 - **Runtime**: ~2min - **Cost**: ~$0.05 - **When**: Weekly, or when explicitly needed ### Marker Tests (Tier 3) - CI Only - **Command**: `pytest -m marker -n 0` (explicit) - **Tests**: 4 - **Runtime**: 1-2min - **Cost**: $0.00 (CPU/RAM) - **When**: Nightly CI --- ## What Changed ### Files Modified 1. ✅ `pytest.ini` - Default marker exclusion 2. ✅ `CLAUDE.md` - Testing Protocol section (60 lines) 3. ✅ `README.md` - Quick Start warning 4. ✅ `docs/TESTING_GUIDE.md` - Comprehensive guide (NEW, 8.5KB) 5. ✅ `docs/AGENT_TESTING_PROTECTION.md` - This document (NEW) ### No Breaking Changes - Default `pytest` command unchanged - Fast tests still run by default - Expensive tests available when needed - CI workflows unaffected --- ## Verification ### Test the Protection ```bash # Good: Fast tests pytest # Should see: "220/274 tests collected (54 deselected)" # Runtime: ~6 seconds # Bad: Would run expensive (but now warned against) # pytest -m "" # DON'T RUN THIS - it's the problem we're preventing! ``` ### Check Agent Guidance ```bash # Claude Code agents will read CLAUDE.md cat CLAUDE.md | grep -A 20 "Testing Protocol" # Shows clear "NEVER" warnings for expensive commands ``` --- ## Summary **Problem**: 205.78 second test runs due to accidental `-m ""` flag **Root cause**: Agents and developers running all tests including expensive ones **Solution**: Three-layer protection system: 1. **Technical**: pytest.ini defaults to fast tests only 2. **Agent guidance**: CLAUDE.md explicit testing protocol 3. **Documentation**: README warning + comprehensive guide **Result**: - Agents will run `pytest` (6s, $0.00) ✅ - Developers warned against `-m ""` ✅ - Expensive tests available when needed ✅ - ~99% protection rate for agents ✅ **Monthly cost savings**: $54.42 (98.9% reduction maintained) The test suite is now protected against accidental expensive runs while remaining fully functional for when expensive tests are actually needed.

Loading blob content...

Latest Blog Posts

Don't Use Large Strings as Cache Keys
By punkpeye on January 11, 2026.
markdown
node-js
cache
What are Claude Skills?
By punkpeye on January 10, 2026.
mcp
skills
How to Test MCP Streamable HTTP Endpoints Using cURL
By punkpeye on January 2, 2026.
tutorial
bash

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gaiaaiagent/regen-registry-review-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

2025-11-20-AGENT_TESTING_PROTECTION.md•7.44 kB