Ingero
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| INGERO_DB | No | Path to the Ingero SQLite database file. Defaults to ~/.ingero/ingero.db. | |
| INGERO_NO_UPDATE_NOTIFIER | No | Set to '1' to disable the background update check for new versions. |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": true
} |
| logging | {} |
| prompts | {
"listChanged": true
} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| get_causal_chainsA | Analyze CUDA + host events and return causal chains with severity, root cause, and recommendations. Deduplicates by operation, returns top 10 by default (use top_n to adjust). AI-first: TSC-compressed by default. Works with both live and saved/offline databases. Omit 'since' for saved DBs. |
| get_checkA | Run system diagnostics: kernel version, BTF support, NVIDIA driver, CUDA libraries, running GPU processes |
| get_stacksA | Get resolved call stacks for CUDA/driver operations. Returns top stacks by frequency with symbol names, source files, and timing stats. One call answers 'what code path caused this operation?' For older DBs without resolved symbols, falls back to raw IPs (hex addresses). |
| get_test_reportA | Get the GPU integration test report (JSON). Generated by gpu-test.sh after a full test run. Includes per-test status, timing, and system info. |
| get_trace_statsA | Get CUDA and host operation statistics. Returns p50/p95/p99 for small DBs (≤500K events), count/avg/min/max from aggregates for large DBs. Works with both live and saved/offline databases. Omit 'since' for saved DBs. |
| graph_frequencyA | Analyze CUDA Graph launch frequency per executable. Identifies hot graphs (high replay rate), cold graphs (captured but rarely launched), and graph pool saturation. Essential for vLLM batch size tuning. |
| graph_lifecycleA | Show CUDA Graph lifecycle timeline for a PID: capture → instantiate → launch sequences with timestamps and durations. Identifies graph activity patterns in torch.compile and vLLM workloads. |
| pagerduty_triggerA | Create a PagerDuty incident with rich context. For AI-driven escalation during investigations. |
| query_fleetA | Query multiple Ingero nodes and return merged results. Requires fleet.nodes configured in ingero.yaml. Actions: chains (causal chains sorted by severity), ops (per-op stats), overview (summary per node), sql (raw SQL fan-out with node column). |
| run_demoA | Run a synthetic demo scenario and return the stats snapshot. No GPU or root needed. |
| run_sqlA | Execute read-only SQL on the Ingero database. For ad-hoc analysis the fixed tools can't do: temporal bucketing, threshold queries, per-PID breakdowns, throughput calculations. Timeout: 30s. Schema: events(id, timestamp INT nanos, pid, tid, source, op, duration INT nanos, gpu_id, arg0, arg1, ret_code, stack_hash, cgroup_id INT default 0, comm TEXT default '' — process name from bpf_get_current_comm(), v0.10+, empty for pre-v0.10 rows), system_snapshots(id, timestamp, cpu_pct, mem_pct, mem_avail, swap_mb, load_avg), causal_chains(id TEXT, detected_at, severity, summary, root_cause, explanation, recommendations JSON, cuda_op, cuda_p99_us, cuda_p50_us, tail_ratio, timeline JSON), sessions(id, started_at, stopped_at, gpu_model, gpu_driver, cpu_model, cpu_cores, mem_total, kernel, os_release, cuda_ver, python_ver, ingero_ver, pid_filter, flags), sources(id, name, description), ops(source_id, op_id, name, description), process_names(pid, name, seen_at — LEGACY: lazy /proc-based PID→name table, used as read-side fallback when events.comm is empty), event_aggregates(bucket, source, op, pid, count, stored, sum_dur, min_dur, max_dur, sum_arg0), stack_traces(hash, ips TEXT JSON, frames TEXT JSON resolved symbols), cgroup_metadata(cgroup_id PK, container_id TEXT, cgroup_path TEXT), cgroup_schedstat(cgroup_id PK, p99_off_cpu_ns, total_off_cpu_ns, event_count, window_start, window_end), schema_info(key, value). JOINs: events.source=sources.id, events.(source,op)=ops.(source_id,op_id), events.stack_hash=stack_traces.hash, events.cgroup_id=cgroup_metadata.cgroup_id (K8s container context), events.pid=process_names.pid (ALWAYS qualify pid as e.pid when joining - pid exists in both tables). For process names prefer events.comm directly (faster, no JOIN); use COALESCE(NULLIF(e.comm,''), NULLIF(pn.name,''), '') only when also reading legacy pre-v0.10 rows. Sources: 1=CUDA, 3=HOST, 4=DRIVER, 5=IO, 6=TCP, 7=NET. CUDA ops: 1=cudaMalloc, 2=cudaFree, 3=cudaLaunchKernel, 4=cudaMemcpy, 5=cudaStreamSync, 6=cudaDeviceSync, 7=cudaMemcpyAsync, 8=cudaMallocManaged. HOST ops: 1=sched_switch, 2=sched_wakeup, 3=mm_page_alloc, 4=oom_kill, 5=process_exec, 6=process_exit, 7=process_fork, 10=pod_restart, 11=pod_eviction, 12=pod_oom_kill. DRIVER ops: 1=cuLaunchKernel, 2=cuMemcpy, 3=cuMemcpyAsync, 4=cuCtxSynchronize, 5=cuMemAlloc, 6=cuMemAllocManaged. IO ops: 1=block_read, 2=block_write, 3=block_discard. TCP ops: 1=tcp_retransmit. NET ops: 1=net_send, 2=net_recv. arg0/arg1 per op: cudaMalloc/cudaMallocManaged arg0=size_bytes, cudaFree arg0=devPtr, cudaLaunchKernel arg0=kernel_func_ptr, cudaMemcpy/cudaMemcpyAsync arg0=bytes arg1=direction(0=H2H,1=H2D,2=D2H,3=D2D,4=default), cudaStreamSync arg0=stream_handle, mm_page_alloc arg0=page_order(size=4KB<<order), cuMemAlloc/cuMemAllocManaged arg0=size_bytes, block_read/block_write arg0=nr_sectors, net_send/net_recv arg0=bytes. sum_arg0 in event_aggregates = sum of arg0 across bucket (skipped for pointer-valued ops: cudaFree, cudaLaunchKernel, cuLaunchKernel). Timestamps: unix nanos. Duration: nanos (÷1e3=µs, ÷1e6=ms). Performance: events can have millions of rows. For large DBs, query event_aggregates (per-minute stats, always small) or stack_traces (deduplicated, always small) instead of scanning events. Use get_stacks tool for call stack analysis instead of manual SQL JOINs. |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
| investigate | Analyze a GPU trace database for performance problems. Works with saved/offline databases. |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ingero-io/ingero'
If you have feedback or need assistance with the MCP directory API, please join our Discord server