research.md•4.82 kB
# Research: Python Debugging MCP Tool
Date: 2025-10-23
Branch: 001-python-debug-tool
## Unknowns Addressed
1) Entrypoint strategy (script path vs module vs test target) and args/env
- Decision: Support script path with project-relative path in v1; optional `args` array and `env` map in request.
- Rationale: Simple and covers most debugging needs; avoids test-runner coupling.
- Alternatives considered:
- Module path (python -m): flexible but adds ambiguity in package discovery; can be added in v2.
- Test target (pytest nodeid): useful but couples to pytest and adds complexity; defer to v2.
2) Virtualenv/conda detection and interpreter selection
- Decision: Use the Python executable running the MCP server by default; allow override via request `pythonPath` or config file.
- Rationale: Predictable and avoids cross-env issues; explicit override unblocks non-default envs.
- Alternatives considered:
- Auto-detect .venv/conda: convenient but error-prone across setups; may be added with explicit precedence rules later.
3) Conditional breakpoints and expression evaluation scope
- Decision: v1 supports line breakpoints only; no condition or expression evaluation. v2 may add simple conditions evaluated safely.
- Rationale: Reduces risk; expression evaluation increases security and complexity.
- Alternatives considered:
- Full conditional breakpoints via eval: powerful but high risk and harder to sandbox.
## Best Practices
- Isolation: run debuggee in a subprocess; enforce timeouts; terminate on overrun.
- Safety: restrict working directory to workspace root; forbid network and filesystem writes by default (documented caveat: app code may still write; provide `--allow-fs` opt-in later).
- Observability: structured logs for session lifecycle; include timing and reasons for termination.
- Data limits: truncate variable representations (e.g., depth=2, max items=50, max string len=256).
- Determinism: ensure tests seed randomness and avoid flakiness.
## Interfaces and Protocol
- MCP tool methods (conceptual):
- `start_session(entry, args?, env?) -> {sessionId}`
- `run_to_breakpoint(sessionId, file, line) -> {hit: bool, locals: {...}, frameInfo}`
- `continue(sessionId) -> {hit: bool, locals?, frameInfo?, completed: bool}`
- `state(sessionId) -> {status, lastBreakpoint?, timings}`
- `end_session(sessionId) -> {ended: true}`
- Internal debug transport: parent/child IPC via multiprocessing Pipe; messages as JSON.
## Tooling Choices
- Language: Python 3.11
- Libraries:
- bdb/pdb (stdlib): Core debugging functionality
- pydantic v2 (schemas): Request/response validation with detailed error messages
- pytest 8.4.2 (tests): Test framework with 134 comprehensive tests
- ruff (lint/format): Fast Python linter and formatter
- typer (CLI): User-friendly command-line interface with rich output
- rich (CLI): Beautiful terminal formatting for tables and JSON
- Not chosen: debugpy/DAP for v1 (adds complexity); may adopt in v2 for richer stepping.
## Implementation Details (as of v1)
### Entry Strategy
- **Script path**: Project-relative paths (e.g., `src/main.py`)
- **Working directory**: Always set to workspace root for consistency
- **Arguments**: Passed via `args` array in StartSessionRequest
- **Environment**: Merged with parent env via `env` dict in StartSessionRequest
- **Python interpreter**:
- Default: Server's Python executable
- Override: `pythonPath` field in StartSessionRequest
- Detection: No automatic venv/conda detection in v1 (explicit only)
### Conditional Breakpoints (v2 Scope)
- **v1**: Line breakpoints only (file + line number)
- **v2 planned features**:
- Simple conditions evaluated in debuggee context (e.g., `x > 10`)
- Hit count conditions (e.g., "pause on 3rd hit")
- Log points (print without pausing)
- Safety: Conditions executed with timeout and expression size limits
### Execution Guards (Implemented in v1)
- **Timeout**: 20s default per breakpoint operation
- **Output capture**: 10MB max per session
- **Memory**: No explicit limit; relies on OS/container limits
- **Recursion**: Variable repr limited to depth=2
- **Collection size**: Max 50 items shown in lists/dicts
- **String length**: Max 256 chars in repr before truncation
## Risks and Mitigations
- Risk: Long-running or side-effect-heavy code.
- Mitigation: Strict timeouts; default cwd lock; document risks; optional allowlist.
- Risk: Large object graphs in locals.
- Mitigation: Safe repr with size/recursion limits.
- Risk: Environment mismatch.
- Mitigation: Explicit pythonPath override; quickstart guides for venv.
## Decision Summary
- v1 focuses on reliability and simplicity using stdlib debugging primitives with clear limits.
- Provide session lifecycle and two core operations: run-to-breakpoint and continue.