# MCP Debugger Testing Report
## Thorough validation of Python and JavaScript tools
Test Date: 2025-10-09
Tester: Cline AI
MCP Server: mcp-debugger
---
## Executive summary
Core debugging flows work well for both Python and JavaScript:
- Sessions: create, launch with stopOnEntry, and close are reliable
- Breakpoints, stepping, stack/scopes/variables, expression evaluation, and source context are all functional
- Python experience is polished end-to-end
- JavaScript experience is functional but has quirks that make it a bit noisier and more stateful for clients
No broken features were found in tested flows. The biggest frictions are:
- JavaScript stack traces include many Node internal frames
- JavaScript variableReferences and frameIds churn frequently, requiring a fetch-refresh pattern
- JavaScript breakpoints sometimes report Unbound at set time
- start_debugging scriptPath resolution required absolute paths in our environment (Python)
The sections below detail what works, what’s confusing (but works), and concrete suggestions to improve the UX.
---
## What was tested (tools exercised)
- list_supported_languages
- create_debug_session
- start_debugging (with stopOnEntry, justMyCode)
- set_breakpoint
- continue_execution
- get_stack_trace
- get_scopes
- get_variables
- evaluate_expression
- step_into / step_over / step_out
- get_source_context
- close_debug_session
Note: pause_execution is documented as Not Implemented and was not expected to work.
---
## Python debugging results
Works really well
- Session lifecycle
- create_debug_session returned sessionId and clear message
- close_debug_session cleaned up reliably
- Launch
- start_debugging with stopOnEntry successfully paused on entry
- stopOnEntrySuccessful: true reported in data
- Breakpoints
- Verified breakpoints with helpful surrounding context
- Example: set breakpoint at line 32 “fact_result = factorial(5)”, verified: true
- Stepping
- step_into entered factorial as expected
- step_over advanced execution and populated locals
- Stack, scopes, variables
- get_stack_trace showed frames for main and <module>, then factorial when stepping into it
- get_scopes returned Locals and Globals scopes
- get_variables for Locals showed expected values (x, y, z, etc.) after executing lines
- Expression evaluation
- evaluate_expression with pure expressions (e.g., n + 1) worked and returned correct types/values
- Attempting assignment (x = 99) resulted in SyntaxError; evaluate appears expression-only in this context
- Source context
- get_source_context provided precise line and surrounding code, extremely helpful for orientation
Confusing but works
- Absolute path requirement for scriptPath (environment-specific)
- start_debugging with relative path “test-scripts/test_python_debug.py” returned “Script file not found”
- Using absolute path succeeded
- Recommendation: Allow paths relative to server CWD, or document path expectations prominently
- Variables appear after execution of defining line
- Standard debugger behavior (stop before execute) but can surprise users
- Suggest documenting that you may need to step once for new locals to appear
Observed behavior (not bugs)
- Assignment in evaluate_expression raised SyntaxError
- Likely by design: evaluation accepts expressions, not statements
- Recommendation: Document this; optionally provide a mode or flag to allow statement execution (if adapter/runtime permits)
Outcome
- Full flow: created, launched, paused, set breakpoints, stepped into and over, inspected scopes/variables, evaluated expressions, fetched source context, continued, and closed
- Rating: 9/10
---
## JavaScript debugging results
Works really well
- Session lifecycle
- create_debug_session returned sessionId and closed cleanly at end
- Launch
- start_debugging with stopOnEntry paused on entry
- Scopes and variables
- get_scopes returned granular scopes: “Local: main”, “Module”, “Global”
- get_variables returned expected values in main (x, y, z) and factorial (n)
- Expression evaluation
- evaluate_expression in factorial frame (n + 1) returned correct result (6)
- Stepping
- step_into, step_out, and step_over worked reliably
- Recursion depth in factorial required multiple step_out calls to return to main (expected)
Confusing but works
- Unbound breakpoint warnings on set
- set_breakpoint at lines 40 and 54 reported verified: false, Unbound breakpoint, even though the file was the running script
- We primarily proceeded via stepping and did not verify whether they would bind later during run; could be a timing/mapping nuance
- Recommendation: Improve binding feedback or re-check binding after script load advances; document lifecycle (breakpoints may bind later)
- Verbose stack traces (Node internals)
- get_stack_trace includes many Node internal frames; the first few user frames are useful, the rest are noise
- Recommendation: Add an option to filter internal frames or reduce default depth to user code
- Variable/Frame identity churn
- After stepping, frameIds and variablesReference change; clients must refresh stack/scopes/variables
- Recommendation: Document the fetch-refresh pattern clearly (step -> stack -> scopes -> variables)
- Variables appear after execution of defining line
- Same as Python; recommend documenting expectations
Outcome
- Full flow: created, launched, paused, set breakpoints (unbound at set time), inspected stack/scopes/variables, evaluated expressions, stepped (including out of recursion multiple times), and closed
- Rating: 7/10 (functional, but noisier/more stateful than Python)
---
## Side-by-side summary
- Sessions: Both excellent
- Launch + stopOnEntry: Both excellent
- Breakpoints:
- Python: Verified at set time and worked
- JavaScript: Reported Unbound at set time; likely timing/sourcemap/loading nuance (still functional via stepping)
- Stepping: Both reliable
- Stack traces:
- Python: concise and user-focused
- JavaScript: verbose with many Node-internal frames
- Scopes/variables:
- Python: simple and stable
- JavaScript: granular scopes but requires frequent refresh after steps; variableReferences churn
- Expression evaluation:
- Python: expressions only; assignment raised SyntaxError (likely by design)
- JavaScript: expressions worked as expected
- Source context: Excellent for both
---
## What’s broken
- No hard-broken behaviors were found in exercised flows
- pause_execution is explicitly Not Implemented (as documented)
---
## Confusing but works (top items)
1) JavaScript stack traces are noisy
- Too many Node internal frames for typical workflows
- Suggest providing filter options or default user-only frames
2) JavaScript variableReferences/frameIds change after each step
- Requires re-fetching stack, scopes, and variables after every step
- Document the step -> stack -> scopes -> variables pattern
3) Breakpoint lifecycle and binding in JavaScript
- Breakpoints reported Unbound at set time
- Likely bind later or are subject to timing/sourcemap state
- Provide clearer messaging and/or re-check binding automatically
4) Python evaluate_expression is expression-only
- Assignment throws SyntaxError
- Document limitations; optionally support statement evaluation (if feasible/secure)
5) Absolute vs relative scriptPath resolution
- In our environment, Python start_debugging required absolute path
- Improve path resolution or document clearly
---
## Works really well
- Source context: Outstanding and highly useful
- Session lifecycle: Clean, reliable, explicit messages
- Python end-to-end polish: Breakpoint verification, stack, scopes, stepping, evaluation, and source context all felt cohesive
- Stepping controls: step_into/over/out behaved predictably in both languages
---
## Recommendations (what could work better and how)
High priority
- JS stack trace filtering
- Add a parameter to get_stack_trace like includeInternalFrames (default false)
- Or a maxDepth focused on user frames by default with a way to include internals
- Document JS variable reference lifecycle
- Provide a “best-practice loop” in docs for stepping:
1) Step
2) get_stack_trace (current frameId)
3) get_scopes (new variablesReference)
4) get_variables (with the new reference)
Medium priority
- Breakpoint binding clarity (JS)
- Improve messages around Unbound -> Bound transitions
- Optionally expose a tool to re-query breakpoint statuses
- Path resolution (start_debugging)
- Support relative paths relative to server CWD
- Or explicitly document required path form in error message
- Python evaluate_expression documentation
- Call out expressions-only vs statements
- Consider a mode to enable statement execution (e.g., REPL context), if adapter/runtime permits
Low priority
- Provide a “filter” hint for get_stack_trace to drop known noisy Node internals by default
- Add examples for complex/nested data structures (expandable variables) in docs
- Add tests for conditional and multiple breakpoints, error cases, and large files
---
## Repro highlights (from this run)
Python
- start_debugging: absolute scriptPath required; paused on entry with reason=entry
- set_breakpoint: 32 (factorial call), 46 (final computation) verified true
- continue_execution to 32; step_into factorial; get_variables shows n=5; evaluate_expression “n + 1” returns 6
- get_source_context for current lines accurate
- step_over at 46; get_variables shows final=3600
- continue_execution then close_debug_session
JavaScript
- start_debugging paused on entry; stack shows factorial at line 8, main at 40
- set_breakpoint at 40 and 54 returned Unbound (verified false)
- get_scopes main shows x/y/z; factorial shows n=5
- evaluate_expression in factorial: “n + 1” returns 6
- step_out repeatedly (recursive stack); stepping works
- close_debug_session
---
## Conclusion
mcp-debugger delivers a capable, cross-language debugging experience with excellent fundamentals and a strong Python path. JavaScript is fully usable but noisier, with stack verbosity and stateful variable/frame references that make client code more chatty. Addressing stack filtering, documenting the refresh pattern, clarifying breakpoint binding, and improving path handling would materially improve developer ergonomics.
Overall ratings (subjective):
- Python: 9/10
- JavaScript: 7.5/10
- API design and stability: 8/10
Would recommend adoption. With the above refinements, this would be an excellent debugging MCP server.