Skip to main content
Glama

nsys-mcp is an MCP (Model Context Protocol) server that provides GPU profiling capabilities through NVIDIA Nsight Systems (nsys). It lets an LLM agent profile binaries, parse reports, compute statistics, and analyze interval trees — all via standard MCP tool calls.

Prerequisites

Installation

pip install -e .

For development (tests):

pip install -e ".[dev]"

Running the Server

The server communicates over stdio (the default MCP transport):

python -m nsys_mcp.server

Cursor / VS Code MCP configuration

Add to your MCP settings (e.g. .cursor/mcp.json):

{
  "mcpServers": {
    "nsys-profiler": {
      "command": "python",
      "args": ["-m", "nsys_mcp.server"]
    }
  }
}

Available Tools

The server exposes 10 tools:

#

Tool

Description

1

check_nsys

Verify that nsys is installed and return its version

2

profile_binary

Profile a binary with full CUDA, NVTX, and GPU metrics collection

3

load_report

Load a pre-existing .nsys-rep or NDJSON .json file

4

list_reports

List all cached profiling reports with metadata

5

get_event_summary

Breakdown of event types and counts for a report

6

get_kernel_stats

Aggregate GPU kernel statistics grouped by kernel name

7

get_nvtx_stats

Aggregate NVTX range durations grouped by annotation text

8

get_memcpy_stats

Aggregate memory copy statistics grouped by direction

9

build_interval_tree

Construct an interval tree from profiling events

10

query_interval_tree

Run structural queries against an interval tree

profile_binary

Profile a binary with full CUDA, NVTX, and GPU metrics collection. Results are cached so repeated calls with the same arguments skip re-profiling.

Parameter

Type

Description

binary

str

Path to the executable

args

list[str]

Command-line arguments (optional)

env

dict[str, str]

Extra environment variables (optional)

cwd

str

Working directory (optional)

duration

int

Max profiling duration in seconds (optional)

extra_nsys_flags

list[str]

Additional nsys flags (optional)

Returns report_id, event_counts, and time_span_ns.

load_report

Load a pre-existing .nsys-rep or NDJSON .json file without re-profiling.

Parameter

Type

Description

path

str

Path to .nsys-rep or .json file

get_event_summary

Get a breakdown of event types and counts for a report.

Parameter

Type

Description

report_id

str

ID from profile_binary or load_report

get_kernel_stats

Aggregate GPU kernel statistics grouped by kernel name. Includes duration statistics (mean, std, min, max, median, count, total) and GPU metrics (grid/block size, shared memory, registers).

Parameter

Type

Description

report_id

str

Report identifier

top_n

int

Limit to top N kernels (optional)

sort_by

str

total_ns, count, mean_ns, or max_ns (default: total_ns)

get_nvtx_stats

Aggregate NVTX range durations grouped by annotation text.

Parameter

Type

Description

report_id

str

Report identifier

domain_id

int

Filter by NVTX domain (optional)

get_memcpy_stats

Aggregate memory copy statistics grouped by copy direction (HtoD, DtoH, DtoD, etc.). Includes duration stats, total bytes, and bandwidth estimates.

Parameter

Type

Description

report_id

str

Report identifier

build_interval_tree

Construct an interval tree from profiling events. If multiple disjoint trees exist (a forest), they can be merged under a synthetic root.

Parameter

Type

Description

report_id

str

Report identifier

event_types

list[str]

Subset of ["kernel", "nvtx", "trace", "memcpy", "sync"] (default: all)

reduce_forest

bool

Merge forest into single tree (default: true)

thread_id

int

Filter by thread/stream ID (optional)

query_interval_tree

Run structural queries against a previously built interval tree.

Parameter

Type

Description

report_id

str

Report identifier

query_type

str

One of the query types below

event_name

str

Event name for count_calls

subtree_root_name

str

Scope query to a named subtree (optional)

max_depth

int

Limit traversal depth (optional)

Query types:

Type

Description

most_time_consuming

Find the longest-duration event in a subtree

top_level

List top-level interval names

count_calls

Count occurrences of a named event in a subtree

subtree_summary

Aggregated stats for a named subtree

Typical Workflow

1. check_nsys()                              — verify nsys is available
2. profile_binary(binary="/app/solver", ...) — profile and get report_id
3. get_kernel_stats(report_id, top_n=10)     — see top 10 kernels
4. get_nvtx_stats(report_id)                 — see NVTX annotation timings
5. get_memcpy_stats(report_id)               — see memory transfer stats
6. build_interval_tree(report_id)            — build the tree
7. query_interval_tree(report_id,            — find bottleneck
       query_type="most_time_consuming")
8. query_interval_tree(report_id,            — count specific kernel calls
       query_type="count_calls",
       event_name="cub::DeviceReduce")

Caching

Profiling results are cached in two tiers:

  • In-memory LRU — fast access for the current session (up to 8 reports).

  • Disk — persists across server restarts at ~/.nsys_mcp/cache/.

Cache keys are derived from the binary path and arguments, so identical profiling runs reuse cached results automatically.

Testing

pip install -e ".[dev]"
pytest

Project Structure

src/nsys_mcp/
├── server.py           # FastMCP server, tool definitions, lifespan
├── nsys_runner.py      # nsys CLI wrapper (profile, export, version)
├── report_parser.py    # NDJSON streaming parser, string-table resolution
├── models.py           # Pydantic models for events, stats, configs
├── aggregator.py       # Group-by aggregation (mean, std, min, max, count)
├── interval_tree.py    # Interval tree/forest construction + queries
└── cache.py            # Two-tier cache (memory LRU + disk pickle)

License

nsys-mcp is licensed under the MIT License.

-
security - not tested
A
license - permissive license
-
quality - not tested

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/KorovkoAlexander/nsys_profiler_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server