wpa-mcp
wpa-mcp
A C# MCP server that exposes Windows ETW (.etl) trace analyzers — CPU, wait, image-load, file / disk / mmap I/O — over any MCP-compatible client (Claude Code, Claude Desktop, Codex, Cursor). Domain-neutral: works on any Windows trace; commonly used to debug app startup, slow forks, AV-induced stalls, and disk-bound regressions.
Status — PoC. 54 tools live, internal use only until validated. Windows-only (TraceEvent kernel parsers are not portable). Apache-2.0.
See it in action: a real investigation — process creation 50× slower than baseline, root-caused via wpa-mcp's tools to multiple EDR stacks colliding on
PsSetCreateProcessNotifyRoutineEx. Reproduced independently by two different LLM agents on the same trace.
Quickstart
Once installed (one-liner below), ask the agent in plain language and it picks the matching tools:
> Load this trace: C:\path\to\trace.etl
(load_trace; first call 30 s – 3 min as .etlx index is built; subsequent are
instant. Response includes a Capabilities map so you know upfront which
keywords are present in the trace.)
> Which processes have the highest wait ratio?
(list_processes orderBy=wait_ratio — trace-resident processes auto-filtered out)
> For parent PID <X>, what was each fork's kernel-side gap?
(process_create_timing — one call gives kernel-window distribution across all
children of one parent)
> Top wait stacks for PID <X> between <t0> and <t1>, with 20-bucket histogram
(wait_top_stacks — shows the Filter Manager / driver chain blocking the thread)
> Drill into "<frame!?>": who calls it?
(wait_caller_callee — caller / callee neighbours of the focus frame)The same pattern works for CPU (cpu_top_functions → cpu_caller_callee), file / disk / mmap I/O, image loads, etc. Each "top" view has a matching "caller-callee" drill-down that takes a focus frame.
For an end-to-end walkthrough — symptoms, tool chain, evidence, root cause, recommendations — see docs/CASE_STUDIES.md.
Install
One-liner (no clone, no build)
PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/install.ps1) }"Git Bash on Windows:
curl -fsSL https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/install.sh | bashBoth routes do the same thing: download the latest GitHub Release zip (pre-built DLL), cache under %LOCALAPPDATA%\wpa-mcp\releases\<tag>\, and run the bundled setup.ps1. Auto-detects every MCP client on the machine (Claude Code / Codex / Claude Desktop) and registers wpa-mcp against each. .NET 8 runtime is auto-installed user-scope if missing. Subsequent runs are instant (cache hit).
Forward extra flags through the one-liner:
# PowerShell — pin tag, force a single client, set custom symbol path
iex "& { $(irm https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/install.ps1) } -Tag v0.2.0 -InstallArgs @('-Client','claude-desktop','-SymbolPath','SRV*C:\Symbols*https://msdl.microsoft.com/download/symbols')"# Bash — flags after `bash -s --` go to install.ps1
curl -fsSL https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/install.sh | bash -s -- -Tag v0.2.0Uninstall (one-liner, symmetric)
Web-invokable, edits the same client configs in reverse. No download / cache touched.
iex "& { $(irm https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/uninstall.ps1) }"curl -fsSL https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/uninstall.sh | bashThis removes the wpa-mcp entry from every detected MCP client. The cached release zip and symbol cache stay (delete %LOCALAPPDATA%\wpa-mcp\ and %LocalAppData%\WprMcp\Symbols\ to remove those).
Requirements
Windows 10 / 11 (TraceEvent kernel APIs are Windows-only)
.NET 8 — auto-installed user-scope by the installer if missing (uses Microsoft's official
dotnet-install.ps1; no admin needed). Pass-SkipDotNetInstallto opt out.For symbol resolution:
_NT_SYMBOL_PATHset, or use the symbol tools at runtime (see Configuration → Symbols).
git clone https://github.com/tooluse-labs/wpa-mcp
cd wpa-mcp
.\scripts\setup.ps1git clone https://github.com/tooluse-labs/wpa-mcp
cd wpa-mcp
./scripts/setup.shBuilds (Release) and registers wpa-mcp with every detected MCP client. Idempotent — re-run to update.
Common flags:
.\scripts\setup.ps1 -Client claude-desktop # force a specific client
.\scripts\setup.ps1 -SymbolPath "SRV*C:\Symbols*https://..." # custom _NT_SYMBOL_PATH
.\scripts\setup.ps1 -SkipBuild # use existing DLLUninstall from clone (also -CleanBuild to wipe bin/ obj/):
.\scripts\uninstall.ps1
.\scripts\uninstall.ps1 -CleanBuild./scripts/uninstall.sh
./scripts/uninstall.sh -CleanBuildBuild:
git clone https://github.com/tooluse-labs/wpa-mcp
cd wpa-mcp
dotnet build -c Release
# DLL: src\WprMcp\bin\Release\net8.0\WprMcp.dllSmoke-check:
dotnet src\WprMcp\bin\Release\net8.0\WprMcp.dll --version # prints "WprMcp 0.1.0-poc"
dotnet test # runs the xUnit suite (needs fixtures, see CONTRIBUTING.md)Then register with your MCP client. The DLL path must be absolute.
Claude Code — per-project (<project>/.mcp.json) or global (~/.claude.json):
{
"mcpServers": {
"wpa-mcp": {
"command": "dotnet",
"args": ["C:/Users/me/Dev/wpa-mcp/src/WprMcp/bin/Release/net8.0/WprMcp.dll"],
"env": {
"_NT_SYMBOL_PATH": "SRV*C:\\Symbols*https://msdl.microsoft.com/download/symbols",
"WPRMCP_CACHE_SIZE": "2"
}
}
}
}Or via the CLI helper:
claude mcp add wpa-mcp --scope user -- dotnet C:/Users/me/Dev/wpa-mcp/src/WprMcp/bin/Release/net8.0/WprMcp.dll(Add -e _NT_SYMBOL_PATH=... for env vars.)
Claude Desktop — %APPDATA%\Claude\claude_desktop_config.json, same shape as above.
Codex / Cursor / other MCP-compatible clients — the server speaks stdio MCP; any client that accepts a command + args config works. Use the same JSON snippet.
Verify — after restart, the client exposes the tools as mcp__wpa-mcp__load_trace, etc. First call to load_trace on a fresh .etl takes 30 s – 3 min while the .etlx index is built (logged to stderr).
Tools
54 tools across 15 groups. All built on the same Microsoft.Diagnostics.Tracing.TraceEvent library PerfView uses, so the underlying analysis quality is identical — what changes is the surface (stdio MCP + JSON instead of a Windows GUI) and the addition of composite tools that package multi-step PerfView workflows into one call.
What wpa-mcp adds vs PerfView
Agent-driven, not UI-driven. PerfView is a Windows GUI you click through; wpa-mcp is a stdio MCP server you talk to in plain language. Same data, no UI fatigue, easy to compose into a CI / regression script.
Composite tools.
diagnose_slow_startup,process_create_timing,image_load_top_gapspackage multi-step PerfView workflows into one call.Capabilities-aware. Every tool's "won't return data" state maps to a single keyword bit in
load_trace'sCapabilitiesmap — no more "why is this view empty" detective work in PerfView.Per-trace symbol recommendations.
load_traceinspects modules in the trace and recommends which symbol servers to add. PerfView leaves symbol setup to the user.
Pattern
Always call load_trace first. It opens the .etl, builds (or reuses) the .etlx index, and returns a Capabilities map — a per-keyword presence check (HasCpuSamples, HasCSwitch, HasFileIo, HasDiskIo, HasImageLoad, HasHardFaults, HasStackWalks, HasVirtualAlloc, HasNetIo, HasRegistry, HasReadyThread, HasInterrupt, HasAlpc, HasThreadEvents, HasClrGc, HasClrJit, HasClrAlloc, HasClrException, HasClrContention, HasNtHeap). Every other tool's behaviour depends on those keywords.
Most groups follow the same three-tool shape: a summary (top-N flat rows), a stacks view (top-N call stacks weighted by the metric), and a caller-callee drill-down (given a focus frame, returns its caller / callee neighbours weighted by the same metric — same shape as PerfView's "Callers" / "Callees" tabs).
Meta
Tool | What it does | PerfView equivalent |
| Opens / caches a | Open a trace file (no |
| Lists processes (sortable by | Processes view |
| Per-fork timing for a parent PID. | (no equivalent — composite, see |
| Per-PID chronological thread lifecycle: every | (no equivalent — events filter on |
CPU stacks
Tool | What it does | PerfView equivalent |
| Top-N hot functions by exclusive CPU samples in a window / for a PID. Optional | CPU Stacks → ByName |
| Same as above for multiple PIDs in a single trace load. Each PID gets an independent CallTree (its inclusive-% column normalises to that PID's samples). | (no equivalent — saves N round-trips) |
| Drill into a focus frame: callers (frames calling INTO it) and callees (frames it calls OUT to), each ranked by inclusive CPU samples. Recursion-safe. | CPU Stacks → Callers / Callees tabs |
Wait / blocked time (CSwitch-derived)
Requires the CSwitch kernel keyword (default WPR CPU profiles include it).
Tool | What it does | PerfView equivalent |
| Per-thread blocked time + dominant wait reasons. The canonical answer to "why was this slow?" when CPU is low. Reasons like | Thread Time → blocked-time per thread |
| Top-N call stacks ranked by blocked μs, built from the resume-point stack walk on each | Thread Time / Wait Time → BlockedTime metric ( |
| Drill into a focus frame; metric is blocked μs. | Thread Time → Callers / Callees tabs |
Image / DLL load
Tool | What it does | PerfView equivalent |
| Per-process chronological list of every | (no direct equivalent — events list filtered manually) |
| Top-N largest gaps between consecutive image loads. Pairs with the chronological view; same data, ranked by gap. Response also carries | (no equivalent — custom view) |
| Top-N call stacks ranked by | Image Load Stacks |
| Drill into a focus frame; metric is image-load count. | Image Load Stacks → Callers / Callees tabs |
File / disk / mmap I/O
The three layers cover different parts of the I/O stack — diff them to localise where time actually goes.
Tool | What it does | PerfView equivalent |
| Top-N files by total | File I/O view → ByFile |
| Top-N stacks by file-IO bytes. Captures all syscalls including cache-served reads — diff with | File I/O Stacks |
| Drill on a focus frame; metric is file-IO bytes. | File I/O Stacks → Callers / Callees tabs |
| Top-N stacks by physical disk-IO bytes — only events that hit physical media (no cache). Requires the | Disk I/O Stacks |
| Drill on a focus frame; metric is physical disk bytes. | Disk I/O Stacks → Callers / Callees tabs |
| Top-N files by hard page-in bytes. Most hard faults are mmap'd files being touched for the first time (DLLs, data files, network-share content); some also come from paged-out heap/stack pages and the page file. Identifies which file caused the page-in load. Requires the | Memory Hard Fault → ByFile |
| Top-N stacks by hard-fault page-in bytes. Distinguishes eager loader-driven page-in from lazy / scanner-induced page-in. | Memory Hard Fault Stacks |
| Drill on a focus frame; metric is page-in bytes. | Memory Hard Fault Stacks → Callers / Callees tabs |
Virtual memory
Tool | What it does | PerfView equivalent |
| Top-N stacks by | VirtualAlloc Stacks |
| Drill on a focus frame; metric is virtual-memory bytes. | VirtualAlloc Stacks → Callers / Callees tabs |
| Top-N stacks by NT-heap allocation bytes ( | HeapAllocStacks |
| Drill on a focus frame; metric is NT-heap bytes. | HeapAllocStacks → Callers / Callees tabs |
Network I/O
Tool | What it does | PerfView equivalent |
| Top-N stacks by network bytes — TCP + UDP, IPv4 + IPv6 send/recv merged. Splits | TCP/IP Stacks + UDP/IP Stacks (merged) |
| Drill on a focus frame; metric is network bytes. | TCP/IP Stacks → Callers / Callees tabs |
| Per-connection lifecycle list — Connect/Accept paired with Disconnect/Reconnect by | (no direct equivalent — Events view manual pairing) |
Registry
Tool | What it does | PerfView equivalent |
| Top-N stacks by registry-operation count (Query / Open / Create / SetValue / EnumerateKey / etc.). Useful for "who's pounding the registry on every hot-path call". Metric is op count (no natural byte cost for registry). Requires the | Registry Stacks |
| Drill on a focus frame; metric is registry op count. | Registry Stacks → Callers / Callees tabs |
ReadyThread (causality)
Tool | What it does | PerfView equivalent |
| Top-N readier stacks (the code that did the | ReadyThread Stacks |
| Drill on a focus frame; metric is ready-event count. | ReadyThread Stacks → Callers / Callees tabs |
Interrupts (DPC / ISR)
Tool | What it does | PerfView equivalent |
| Top-N stacks by kernel interrupt time (DPC + ISR microseconds). Surfaces hot driver routines burning CPU at high IRQL — frequent offenders are consumer-grade GPU drivers, network drivers under load, AV mini-filter callbacks. On a healthy system this should show <5% of trace CPU. Splits | DPC/ISR Stacks |
| Drill on a focus frame; metric is interrupt μs. | DPC/ISR Stacks → Callers / Callees tabs |
ALPC (cross-process IPC)
Tool | What it does | PerfView equivalent |
| Top-N stacks by ALPC message count (Send + Receive). ALPC is the kernel IPC primitive used by RPC, COM, AppContainer broker calls, lsass, the SCM, and most of the Windows service surface — useful for "is this slow because of an LPC round-trip" / "which call chain is doing all the cross-process IPC". Requires the | ALPC Stacks |
| Drill on a focus frame; metric is ALPC message count. | ALPC Stacks → Callers / Callees tabs |
CLR (.NET runtime)
Requires the Microsoft-Windows-DotNETRuntime ETW provider in the capture profile (WPR .wprp files need an explicit <EventCollectorId> for it).
Tool | What it does | PerfView equivalent |
| Per-GC list with wall duration AND stop-the-world pause time. | GCStats |
| Top-N methods by JIT compilation duration. Matches | JIT Stats |
| Top-N stacks by managed-heap allocation bytes, driven by | GC Heap Alloc Stacks |
| Drill on a focus frame; metric is allocation bytes. | GC Heap Alloc Stacks → Callers / Callees tabs |
| Top-N stacks by .NET exception throw count ( | Exceptions Stacks |
| Drill on a focus frame; metric is exception count. | Exceptions Stacks → Callers / Callees tabs |
| Top-N stacks by managed-monitor blocked μs — | Monitor Contention Stacks |
| Drill on a focus frame; metric is blocked μs. | Monitor Contention Stacks → Callers / Callees tabs |
| Managed-heap snapshot timeline — one row per | GCStats per-GC snapshot table |
| Top types finalized + finalizer-thread pause batches. Aggregates | (no equivalent — composite of GCStats fields + Events view filtering) |
Markers / generic ETW events
Tool | What it does | PerfView equivalent |
| Search all ETW events whose name or task contains a substring. Default mode | Events view |
| Top-N stacks by event count for any user-mode ETW provider — AspNetCore, Kestrel, EFCore, Antimalware-AMFilter, Sense (Defender for Endpoint), | Any Stacks (single-provider) |
| Drill on a focus frame; metric is event count. | Any Stacks → Callers / Callees tabs |
Composite diagnostics
Tool | What it does | PerfView equivalent |
| Picks slowest-by-wait-ratio processes (or matches | (no equivalent — composite) |
Symbols
Tool | What it does | PerfView equivalent |
| Sets | File → Set Symbol Path… |
| Appends a symbol server URL with optional local cache (defaults to | File → Set Symbol Path… (single entry) |
| Reports per-module symbol status for a loaded trace and suggests fixes (which servers to add) for unresolved modules. | (no equivalent — programmatic) |
Configuration
Trace cache
LRU, default capacity 2 traces. Override with WPRMCP_CACHE_SIZE=N. First load builds .etlx (slow); cached calls are instant. Capabilities and TraceLog are both cached per (path, mtime) — re-loading the same .etl is free.
Capturing your own traces
See docs/WPR_PROFILE.md for a recommended .wprp that captures CPU + CSwitch + FileIO + DiskIO + HardFaults + Loader stacks. Quick canonical capture:
wpr.exe -start tests\WprMcp.Tests\fixtures\MmapCapture.wprp -filemode
# … reproduce the slow case …
wpr.exe -stop C:\path\to\my_capture.etlSymbols
If
cpu_top_functionsshowsmodule!?everywhere andStats.ResolutionRate < 0.8, your symbols are not working. This is the single biggest source of "garbage output".
Three setup paths (any one suffices — they all set the same _NT_SYMBOL_PATH):
Pre-launch env var (cleanest, survives restarts):
[Environment]::SetEnvironmentVariable("_NT_SYMBOL_PATH", "SRV*C:\Symbols*https://msdl.microsoft.com/download/symbols", "User")Per-MCP-server
envblock in the config JSON (see manual install above). Easiest to share between teammates.Runtime via tool calls — ask the agent: "set the symbol path to SRV*C:\Symbols*https://msdl.microsoft.com/download/symbols, then run
diagnose_symbolson this trace."
Symbol cache defaults to %LocalAppData%\WprMcp\Symbols (separate from PerfView's C:\Symbols to avoid PDB-lock contention). Per-trace recommendations come back inside load_trace's SymbolStatus.Recommendations field, telling you which servers to add for the modules actually present in this trace.
For private vendor symbol servers, Chromium-family browsers, and local-build PDB folders, see docs/SYMBOL_RECIPES.md. Architecture overview and contribution invariants live in docs/ARCHITECTURE.md and CONTRIBUTING.md.
This server cannot be installed
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/tooluse-labs/wpa-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server