wpa-mcp
A C# MCP server that exposes Windows ETW (.etl) trace analyzers — CPU, scheduler waits, image loads, file / disk / mmap / network I/O, registry, memory resources, and CLR runtime events — over any MCP-compatible client (Claude Code, Claude Desktop, Codex, Cursor). Domain-neutral: works on any Windows trace; common uses include diagnosing app startup, slow process creation, AV / EDR-induced stalls, and disk-bound regressions.
Status — PoC. Broad MCP tool surface available. Windows-only (TraceEvent kernel parsers are not portable). Apache-2.0.
See it in action: a real investigation — process creation 50× slower than baseline, traced to multiple EDR stacks colliding on
PsSetCreateProcessNotifyRoutineEx. Reproduced independently by two LLM agents on the same trace.
Quickstart
Once installed (one-liner below), ask the agent in plain language and it picks the matching tools:
> Load this trace: C:\path\to\trace.etl
(load_trace — first call takes 30 s – 3 min while the .etlx index is built;
subsequent calls are instant. Returns trace metadata plus a Capabilities map
listing which ETW keywords are present.)
> Inspect the trace and tell me what it can answer.
(inspect_trace — capability flags, quality warnings, symbol health, and
applicable next tools)
> Diagnose high wait in PID <X> between <t0> and <t1>.
(diagnose_high_wait — one window-consistent call returning candidates,
evidence, not-concluded reasons, executed-call provenance, and next tools)
> For parent PID <X>, what was each child's kernel-side gap?
(process_create_timing — one call gives the kernel-window distribution across
every child of one parent)
> Drill into one of the top wait frames from the evidence: who calls it?
(wait_caller_callee — caller / callee neighbors of the focus frame)The same summary → stacks → caller/callee pattern works across stack-oriented domains — CPU (cpu_top_functions → cpu_caller_callee), file / disk / mmap I/O, image loads, CLR allocation / exception / contention, network, registry. Lifecycle and resource tools that don't fit a stack shape (memory resource snapshots, thread lifetime, process creation) have their own rows in the tables below.
For an end-to-end walkthrough — symptoms, tool chain, evidence, root cause, recommendations — see docs/CASE_STUDIES.md.
Related MCP server: Aragorn
Install
One-liner (no clone, no build)
PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/install.ps1) }"Git Bash on Windows:
curl -fsSL https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/install.sh | bashBoth routes do the same thing: download the latest self-contained wpa-mcp-win-x64.exe from GitHub Releases into %USERPROFILE%\.local\bin\wpa-mcp.exe, then register that executable directly with every detected MCP client (Claude Code / Codex / Claude Desktop). No local .NET runtime or SDK is required.
Forward extra flags through the one-liner:
# PowerShell — pin tag, force a single client, set custom symbol path
iex "& { $(irm https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/install.ps1) } -Tag v0.2.16 -Client claude-desktop -SymbolPath 'SRV*C:\Symbols*https://msdl.microsoft.com/download/symbols'"# Bash — flags after `bash -s --` go to install.ps1
curl -fsSL https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/install.sh | bash -s -- -Tag v0.2.16Uninstall (one-liner, symmetric)
Web-invokable, edits the same client configs in reverse. No download / cache touched.
iex "& { $(irm https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/uninstall.ps1) }"curl -fsSL https://raw.githubusercontent.com/tooluse-labs/wpa-mcp/main/scripts/uninstall.sh | bashThis removes the wpa-mcp entry from every detected MCP client and deletes %USERPROFILE%\.local\bin\wpa-mcp.exe. The symbol cache stays (delete %LocalAppData%\WprMcp\Symbols\ to remove it).
Requirements
Windows 10 / 11 (TraceEvent kernel APIs are Windows-only)
No .NET runtime is required for the one-line installer; releases ship a self-contained Windows executable.
For symbol resolution: pass
-SymbolPathat install time, set_NT_SYMBOL_PATH, or use the symbol tools at runtime (see Configuration → Symbols).
git clone https://github.com/tooluse-labs/wpa-mcp
cd wpa-mcp
.\scripts\setup.ps1git clone https://github.com/tooluse-labs/wpa-mcp
cd wpa-mcp
./scripts/setup.shBuilds (Release) and registers wpa-mcp with every detected MCP client. Idempotent — re-run to update.
Common flags:
.\scripts\setup.ps1 -Client claude-desktop # force a specific client
.\scripts\setup.ps1 -SymbolPath "SRV*C:\Symbols*https://..." # custom _NT_SYMBOL_PATH
.\scripts\setup.ps1 -SkipBuild # use existing DLLUninstall from clone (also -CleanBuild to wipe bin/ obj/):
.\scripts\uninstall.ps1
.\scripts\uninstall.ps1 -CleanBuild./scripts/uninstall.sh
./scripts/uninstall.sh -CleanBuildBuild:
git clone https://github.com/tooluse-labs/wpa-mcp
cd wpa-mcp
dotnet build -c Release
# DLL: src\WprMcp\bin\Release\net8.0\WprMcp.dllSmoke-check:
dotnet src\WprMcp\bin\Release\net8.0\WprMcp.dll --version # prints "WprMcp 0.2.16"
dotnet test # runs the xUnit suite (needs fixtures, see CONTRIBUTING.md)Then register with your MCP client. The command path must be absolute. For release installs, use %USERPROFILE%\.local\bin\wpa-mcp.exe; for clone builds, use dotnet plus the absolute DLL path.
Claude Code — per-project (<project>/.mcp.json) or global (~/.claude.json):
{
"mcpServers": {
"wpa-mcp": {
"command": "C:/Users/me/.local/bin/wpa-mcp.exe",
"args": [
"--symbol-path",
"SRV*C:\\Symbols*https://msdl.microsoft.com/download/symbols",
"--cache-size",
"2"
]
}
}
}Or via the CLI helper:
claude mcp add wpa-mcp --scope user -- C:/Users/me/.local/bin/wpa-mcp.exe --symbol-path "SRV*C:\Symbols*https://msdl.microsoft.com/download/symbols" --cache-size 2Claude Desktop — %APPDATA%\Claude\claude_desktop_config.json, same shape as above.
Codex / Cursor / other MCP-compatible clients — the server speaks stdio MCP; any client that accepts a command + args config works. Use the same JSON snippet.
Verify — after restart, the client exposes the tools as mcp__wpa-mcp__load_trace, etc. First call to load_trace on a fresh .etl takes 30 s – 3 min while the .etlx index is built (logged to stderr).
Tools
The MCP surface covers multiple ETW analysis domains, all built on the same Microsoft.Diagnostics.Tracing.TraceEvent library PerfView uses — analysis quality matches PerfView. What changes is the surface (stdio MCP + JSON instead of a Windows GUI) plus a small set of composite tools that fold multi-step PerfView workflows into a single call.
What wpa-mcp adds vs PerfView
Agent-driven, not UI-driven. PerfView is a Windows GUI you click through; wpa-mcp is a stdio MCP server you talk to in plain language. Same data, no UI fatigue, easy to compose into CI / regression scripts.
Composite tools.
diagnose_window,diagnose_high_wait,diagnose_slow_startup,process_create_timing,image_load_top_gapsfold multi-step PerfView workflows into one call.Capabilities-aware. Every tool's "won't return data" state maps to a single keyword bit in
load_trace'sCapabilitiesmap — no more "why is this view empty" detective work.Per-trace symbol recommendations.
load_traceinspects modules in the trace and recommends which symbol servers to add. PerfView leaves symbol setup to the user.
Design philosophy
wpa-mcp is built to avoid misleading the model without constraining what the model can infer.
Orientation tools (
load_trace,inspect_trace) expose capabilities, enabled-signal lists, quality gaps, recommended diagnostic flows, and symbol health up front, so the model picks the next call from real signals instead of inferring from empty results.Diagnostic composites (
diagnose_window,diagnose_high_wait,diagnose_slow_startup) shorten the call path but preserve the evidence chain throughEvidence,NotConcluded,ExecutedToolCalls, andNextTools. They deliberately do not return a synthesized "root cause" field.Per-domain row and stack tools stay close to the PerfView shape. When they return empty, the capability signals from
load_trace/inspect_tracedistinguish "the data isn't in this trace" from "no work matched the query".
Usage pattern
Always call load_trace first. It opens the .etl, builds (or reuses) the .etlx index, and returns a Capabilities map showing which ETW keywords are present. Every other tool's behavior depends on those keywords. The map covers:
CPU sampling and scheduling —
HasCpuSamples,HasCSwitch,HasReadyThread,HasStackWalksFile / disk / mmap I/O and loader —
HasFileIo,HasDiskIo,HasHardFaults,HasImageLoadMemory —
HasVirtualAlloc,HasNtHeap,HasMemoryProcessInfo,HasHandleEvents,HasPoolEventsNetwork —
HasNetIo,HasNetConnectionsKernel infrastructure —
HasRegistry,HasInterrupt,HasAlpc,HasThreadEventsCLR runtime —
HasClrGc,HasClrJit,HasClrAlloc,HasClrException,HasClrContention
The full call flow:
.etl trace
│
▼
load_trace ──► returns Capabilities map
│
│ (optional: inspect_trace if capture profile / path unclear)
▼
Composite (recommended for known workflows)
─────────────────────────────────────────────
diagnose_window, diagnose_slow_startup, diagnose_high_wait
returns Evidence + NotConcluded + ExecutedToolCalls + NextTools
│
│ via NextTools
▼
Domain drill (custom investigation or composite follow-up)
────────────────────────────────────────────────────────────
summary ──► stacks ──► caller_callee
top-N top-N focus-frame
rows call chains drill
Example: cpu_top_functions ──► cpu_top_stacks ──► cpu_caller_calleeIf the capture profile or investigation path is unclear, call inspect_trace next. For common workflows, prefer composites such as diagnose_window, diagnose_high_wait, and diagnose_slow_startup before manually stitching individual calls together — their Evidence, NotConcluded, ExecutedToolCalls, and NextTools fields show what was run, what could not be concluded, and where to drill down.
Most stack-oriented groups follow the same three-tool shape: a summary (top-N flat rows), a stacks view (top-N call stacks weighted by the metric), and a caller-callee drill-down (given a focus frame, returns its caller / callee neighbors weighted by the same metric — same shape as PerfView's "Callers" / "Callees" tabs).
In the tables below, "PerfView equivalent" is the matching view in PerfView's GUI. Entries tagged [Composite] combine multiple PerfView views into one call, [Manual filter] expose raw events that PerfView's Events view shows but doesn't pre-aggregate, and [Programmatic] replace a GUI dialog with structured JSON. Most other tools are 1:1 mappings of PerfView views.
Time-window semantics
Tools that accept startUs and endUs use a half-open interval: an event is included only when startUs <= timestamp < endUs. A null boundary means the trace start or trace end respectively.
Tools without startUs / endUs operate on intentionally different scopes; each tool's MCP description states which:
Whole-trace orientation / configuration —
load_trace,inspect_trace,list_processes,find_marker,diagnose_symbols,set_symbol_path,add_symbol_server.Lifecycle views —
process_create_timing,thread_lifetime,image_load_timing,image_load_top_gaps, anddiagnose_slow_startupuse process-start or lifecycle-relative windows instead of an arbitrary trace window.Whole-trace or windowed by-file summaries —
file_io_top_filesandhard_fault_by_fileaggregate over file names and support explicitstartUs/endUswindows. Use the corresponding stack tools when you need call-chain attribution.
Meta
Tool | What it does | PerfView equivalent |
| Opens / caches a | Open a trace file (no |
| One-shot orientation: capture capabilities, enabled-signal names, system metadata, provider counts, stackwalk completeness, symbol quality, quality warnings, capability-supported next-tool hints, and recommended diagnostic flows. Use when the capture profile or investigation path is unclear. | [Programmatic] — replaces manual trace-quality inspection across Events, Modules, and capture metadata |
| Lists processes (sortable by | Processes view |
| Per-child timing for a parent PID. | [Composite] — Processes + Events + Excel; see |
| Per-PID chronological thread lifecycle: every | [Manual filter] — Events view, filter on |
CPU stacks
Tool | What it does | PerfView equivalent |
| Top-N hot functions by exclusive CPU samples in a window / for a PID. Optional | CPU Stacks → ByName |
| CSwitch + ReadyThread scheduler summary: exact on-CPU microseconds, ready-to-run latency, per-core runtime attribution, and quantum/preemption counters by thread. Use when sampled CPU cannot answer "how long did it actually run?" or "how long was it ready before dispatch?" | CPU Usage (Precise) |
| Same as above for multiple PIDs in a single trace load. Each PID gets an independent CallTree (its inclusive-% column normalizes to that PID's samples). | [Composite] — batch variant, saves N round-trips through CPU Stacks → ByName |
| Drill into a focus frame: callers (frames calling INTO it) and callees (frames it calls OUT to), each ranked by inclusive CPU samples. Recursion-safe. | CPU Stacks → Callers / Callees tabs |
Wait / blocked time (CSwitch-derived)
Requires the CSwitch kernel keyword (default WPR CPU profiles include it).
Tool | What it does | PerfView equivalent |
| Per-thread blocked time + dominant wait reasons. The canonical answer to "why was this slow?" when CPU is low. Reasons like | Thread Time → blocked-time per thread |
| Top-N call stacks ranked by blocked μs, built from the resume-point stack walk on each | Thread Time / Wait Time → BlockedTime metric ( |
| Drill into a focus frame; metric is blocked μs. | Thread Time → Callers / Callees tabs |
Image / DLL load
Tool | What it does | PerfView equivalent |
| Per-process chronological list of every | [Manual filter] — Events view, filter on |
| Top-N largest gaps between consecutive image loads. Pairs with the chronological view; same data, ranked by gap. Response also carries | [Manual filter] — same |
| Top-N call stacks ranked by | Image Load Stacks |
| Drill into a focus frame; metric is image-load count. | Image Load Stacks → Callers / Callees tabs |
File / disk / mmap I/O
The three layers cover different parts of the I/O stack — diff them to localise where time actually goes.
Tool | What it does | PerfView equivalent |
| Top-N files by total | File I/O view → ByFile |
| Top-N stacks by file-IO bytes. Captures all syscalls including cache-served reads — diff with | File I/O Stacks |
| Drill on a focus frame; metric is file-IO bytes. | File I/O Stacks → Callers / Callees tabs |
| Top-N stacks by physical disk-IO bytes — only events that hit physical media (no cache). Requires the | Disk I/O Stacks |
| Drill on a focus frame; metric is physical disk bytes. | Disk I/O Stacks → Callers / Callees tabs |
| Top-N files by hard page-in bytes, optionally scoped by | Memory Hard Fault → ByFile |
| Top-N stacks by hard-fault page-in bytes. Distinguishes eager loader-driven page-in from lazy / scanner-induced page-in. | Memory Hard Fault Stacks |
| Drill on a focus frame; metric is page-in bytes. | Memory Hard Fault Stacks → Callers / Callees tabs |
Virtual memory
Tool | What it does | PerfView equivalent |
| Process memory resource snapshots from | Memory / Handles views |
| Top-N stacks by | VirtualAlloc Stacks |
| Drill on a focus frame; metric is virtual-memory bytes. | VirtualAlloc Stacks → Callers / Callees tabs |
| Top-N stacks by NT-heap allocation bytes ( | HeapAllocStacks |
| Drill on a focus frame; metric is NT-heap bytes. | HeapAllocStacks → Callers / Callees tabs |
Network I/O
Tool | What it does | PerfView equivalent |
| Top-N stacks by network bytes — TCP + UDP, IPv4 + IPv6 send/recv merged. Splits | TCP/IP Stacks + UDP/IP Stacks (merged) |
| Drill on a focus frame; metric is network bytes. | TCP/IP Stacks → Callers / Callees tabs |
| Per-connection lifecycle list — Connect/Accept paired with Disconnect/Reconnect by | [Manual filter] — Events view, pair |
Registry
Tool | What it does | PerfView equivalent |
| Top-N stacks by registry-operation count (Query / Open / Create / SetValue / EnumerateKey / etc.). Useful for "who's pounding the registry on every hot-path call". Metric is op count (no natural byte cost for registry). Requires the | Registry Stacks |
| Drill on a focus frame; metric is registry op count. | Registry Stacks → Callers / Callees tabs |
ReadyThread (causality)
Tool | What it does | PerfView equivalent |
| Top-N readier stacks (the code that did the | ReadyThread Stacks |
| Drill on a focus frame; metric is ready-event count. | ReadyThread Stacks → Callers / Callees tabs |
Interrupts (DPC / ISR)
Tool | What it does | PerfView equivalent |
| Top-N stacks by kernel interrupt time (DPC + ISR microseconds). Surfaces hot driver routines burning CPU at high IRQL — frequent offenders are consumer-grade GPU drivers, network drivers under load, AV mini-filter callbacks. On a healthy system this should show <5% of trace CPU. Splits | DPC/ISR Stacks |
| Drill on a focus frame; metric is interrupt μs. | DPC/ISR Stacks → Callers / Callees tabs |
ALPC (cross-process IPC)
Tool | What it does | PerfView equivalent |
| Top-N stacks by ALPC message count (Send + Receive). ALPC is the kernel IPC primitive used by RPC, COM, AppContainer broker calls, lsass, the SCM, and most of the Windows service surface — useful for "is this slow because of an LPC round-trip" / "which call chain is doing all the cross-process IPC". Requires the | ALPC Stacks |
| Drill on a focus frame; metric is ALPC message count. | ALPC Stacks → Callers / Callees tabs |
CLR (.NET runtime)
Requires the Microsoft-Windows-DotNETRuntime ETW provider in the capture profile (WPR .wprp files need an explicit <EventCollectorId> for it).
For minimal JIT-only traces, run tests/WprMcp.Tests/fixtures/Capture-JitOnly.ps1 or use JitOnlyCapture.wprp!ClrJitOnly; it enables the CLR JIT + Loader bits needed by clr_jit_analysis without GC/allocation/exception/contention runtime keywords.
Tool | What it does | PerfView equivalent |
| Per-GC list with wall duration AND stop-the-world pause time. | GCStats |
| Top-N methods by JIT compilation duration. Matches | JIT Stats |
| Top-N stacks by managed-heap allocation bytes, driven by | GC Heap Alloc Stacks |
| Drill on a focus frame; metric is allocation bytes. | GC Heap Alloc Stacks → Callers / Callees tabs |
| Top-N stacks by .NET exception throw count ( | Exceptions Stacks |
| Drill on a focus frame; metric is exception count. | Exceptions Stacks → Callers / Callees tabs |
| Top-N stacks by managed-monitor blocked μs — | Monitor Contention Stacks |
| Drill on a focus frame; metric is blocked μs. | Monitor Contention Stacks → Callers / Callees tabs |
| Managed-heap snapshot timeline — one row per | GCStats per-GC snapshot table |
| Top types finalized + finalizer-thread pause batches. Aggregates | [Composite] — GCStats fields + Events view filtering combined into one call |
Markers / generic ETW events
Tool | What it does | PerfView equivalent |
| Search all ETW events whose name or task contains a substring. Default mode | Events view |
| Top-N stacks by event count for any user-mode ETW provider — AspNetCore, Kestrel, EFCore, Antimalware-AMFilter, Sense (Defender for Endpoint), | Any Stacks (single-provider) |
| Drill on a focus frame; metric is event count. | Any Stacks → Callers / Callees tabs |
Composite diagnostics
Tool | What it does | PerfView equivalent |
| Windowed evidence composite for one | [Composite] — wraps hard faults, file IO, memory, security scan, and wait views |
| Preview composite for high blocked-time investigations. It runs one window-consistent | [Composite] — wraps wait, stack, and ReadyThread views with evidence provenance |
| Picks slowest-by-wait-ratio processes (or matches | [Composite] — wraps startup wait, loader, CPU, and window evidence |
Symbols
Tool | What it does | PerfView equivalent |
| Sets | File → Set Symbol Path… |
| Appends a symbol server URL with optional local cache (defaults to | File → Set Symbol Path… (single entry) |
| Reports per-module symbol status for a loaded trace and suggests fixes (which servers to add) for unresolved modules. | [Programmatic] — replaces Modules tab + Set Symbol Path dialog with structured JSON + auto-recommendations |
Configuration
Trace cache
LRU, default capacity 2 traces. Override with WPRMCP_CACHE_SIZE=N. First load builds .etlx (slow); cached calls are instant. Capabilities and TraceLog are both cached per (path, mtime) — re-loading the same .etl is free.
Capturing your own traces
See docs/WPR_PROFILE.md for a recommended .wprp that captures CPU + CSwitch + FileIO + DiskIO + HardFaults + Loader stacks. Quick canonical capture:
wpr.exe -start tests\WprMcp.Tests\fixtures\MmapCapture.wprp -filemode
# … reproduce the slow case …
wpr.exe -stop C:\path\to\my_capture.etlSymbols
If
cpu_top_functionsshowsmodule!?everywhere andStats.ResolutionRate < 0.8, your symbols are not working. This is the single biggest source of "garbage output".
Where to set the path
_NT_SYMBOL_PATH accepts semicolon-separated entries: SRV*<cache>*<url> for symbol servers, bare folder paths for local PDBs, mix and match. Three setup paths:
Pre-launch env var (cleanest, survives restarts):
[Environment]::SetEnvironmentVariable("_NT_SYMBOL_PATH", "SRV*C:\Symbols*https://msdl.microsoft.com/download/symbols", "User")Per-MCP-server
--symbol-patharg in the config JSON/TOML (see manual install above). Easiest to share between teammates.Runtime via tool calls — ask the agent: "set the symbol path to SRV*C:\Symbols*https://msdl.microsoft.com/download/symbols, then run
diagnose_symbolson this trace."
Symbol cache defaults to %LocalAppData%\WprMcp\Symbols (separate from PerfView's C:\Symbols to avoid PDB-lock contention). Per-trace recommendations come back inside load_trace's SymbolStatus.Recommendations field, telling you which servers to add for the modules actually present in this trace.
Beyond Microsoft modules
The auto-recommendation in load_trace only knows the public servers it has patterns for (Microsoft, Chromium). For your own DLLs, third-party SDKs, or internal builds, append entries explicitly — common shapes:
What you have | Entry to append |
Internal team symbol server |
|
Team shared drop on a UNC share |
|
Local dev build output (your own PDBs) |
|
Order matters — entries are tried left-to-right, first signature match wins. Put the local dev folder first when iterating on a build so your fresh PDB beats the public one.
Build prerequisites for your own DLLs
A symbol server doesn't help if the build never produced a PDB, or if PDB and deployed DLL are from different builds.
.NET / C#:
<DebugType>portable</DebugType>+<DebugSymbols>true</DebugSymbols>. Check that Release configurations don't disable PDB output.C++ (MSVC):
/Zi+/DEBUG:FULL, even in Release. Keep PDB next to DLL.PDB and DLL must share the same signature (GUID + age) — re-link → new signature → old PDB no longer resolves.
Verifying it worked
> load_trace C:\my\trace.etl
> diagnose_symbols C:\my\trace.etl
> cpu_top_functions C:\my\trace.etldiagnose_symbols lists per-module status with hints for unresolved ones; cpu_top_functions's Stats.ResolutionRate should be ≥ 0.8 for actionable output. After changing the symbol path mid-session, already loaded traces do not re-resolve symbols; restart the MCP server for now, or use unload_trace + load_trace once the cache-unload tool is exposed.
For full recipes (UNC paths, private vendors, Chromium-family browsers, cache management, troubleshooting), see docs/SYMBOL_RECIPES.md (中文). Architecture overview and contribution invariants live in docs/ARCHITECTURE.md and CONTRIBUTING.md.
This server cannot be installed
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/tooluse-labs/wpa-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server