Schema | Ingero

Ingero

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
`INGERO_DB`	No	Path to the Ingero SQLite database file. Defaults to ~/.ingero/ingero.db.
`INGERO_NO_UPDATE_NOTIFIER`	No	Set to '1' to disable the background update check for new versions.

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": true }
`logging`	{}
`prompts`	{ "listChanged": true }

Tools

Functions exposed to the LLM to take actions

Name	Description
get_causal_chains	Analyze CUDA + host events and return causal chains with severity, root cause, and recommendations. Deduplicates by operation, returns top 10 by default (use top_n to adjust). AI-first: TSC-compressed by default. Works with both live and saved/offline databases. Omit 'since' for saved DBs.
get_check	Run system diagnostics: kernel version, BTF support, NVIDIA driver, CUDA libraries, running GPU processes
get_stacks	Get resolved call stacks for CUDA/driver operations. Returns top stacks by frequency with symbol names, source files, and timing stats. One call answers 'what code path caused this operation?' For older DBs without resolved symbols, falls back to raw IPs (hex addresses).
get_test_report	Get the GPU integration test report (JSON). Generated by gpu-test.sh after a full test run. Includes per-test status, timing, and system info.
get_trace_stats	Get CUDA and host operation statistics. Returns p50/p95/p99 for small DBs (≤500K events), count/avg/min/max from aggregates for large DBs. Works with both live and saved/offline databases. Omit 'since' for saved DBs.
graph_frequency	Analyze CUDA Graph launch frequency per executable. Identifies hot graphs (high replay rate), cold graphs (captured but rarely launched), and graph pool saturation. Essential for vLLM batch size tuning.
graph_lifecycle	Show CUDA Graph lifecycle timeline for a PID: capture → instantiate → launch sequences with timestamps and durations. Identifies graph activity patterns in torch.compile and vLLM workloads.
pagerduty_trigger	Create a PagerDuty incident with rich context. For AI-driven escalation during investigations.
query_fleet	Query multiple Ingero nodes and return merged results. Requires fleet.nodes configured in ingero.yaml. Actions: chains (causal chains sorted by severity), ops (per-op stats), overview (summary per node), sql (raw SQL fan-out with node column).
run_demo	Run a synthetic demo scenario and return the stats snapshot. No GPU or root needed.
run_sql	Execute read-only SQL on the Ingero database. For ad-hoc analysis the fixed tools can't do: temporal bucketing, threshold queries, per-PID breakdowns, throughput calculations. Timeout: 30s. Schema: events(id, timestamp INT nanos, pid, tid, source, op, duration INT nanos, gpu_id, arg0, arg1, ret_code, stack_hash, cgroup_id INT default 0, comm TEXT default '' — process name from bpf_get_current_comm(), v0.10+, empty for pre-v0.10 rows), system_snapshots(id, timestamp, cpu_pct, mem_pct, mem_avail, swap_mb, load_avg), causal_chains(id TEXT, detected_at, severity, summary, root_cause, explanation, recommendations JSON, cuda_op, cuda_p99_us, cuda_p50_us, tail_ratio, timeline JSON), sessions(id, started_at, stopped_at, gpu_model, gpu_driver, cpu_model, cpu_cores, mem_total, kernel, os_release, cuda_ver, python_ver, ingero_ver, pid_filter, flags), sources(id, name, description), ops(source_id, op_id, name, description), process_names(pid, name, seen_at — LEGACY: lazy /proc-based PID→name table, used as read-side fallback when events.comm is empty), event_aggregates(bucket, source, op, pid, count, stored, sum_dur, min_dur, max_dur, sum_arg0), stack_traces(hash, ips TEXT JSON, frames TEXT JSON resolved symbols), cgroup_metadata(cgroup_id PK, container_id TEXT, cgroup_path TEXT), cgroup_schedstat(cgroup_id PK, p99_off_cpu_ns, total_off_cpu_ns, event_count, window_start, window_end), schema_info(key, value). JOINs: events.source=sources.id, events.(source,op)=ops.(source_id,op_id), events.stack_hash=stack_traces.hash, events.cgroup_id=cgroup_metadata.cgroup_id (K8s container context), events.pid=process_names.pid (ALWAYS qualify pid as e.pid when joining - pid exists in both tables). For process names prefer events.comm directly (faster, no JOIN); use COALESCE(NULLIF(e.comm,''), NULLIF(pn.name,''), '') only when also reading legacy pre-v0.10 rows. Sources: 1=CUDA, 3=HOST, 4=DRIVER, 5=IO, 6=TCP, 7=NET. CUDA ops: 1=cudaMalloc, 2=cudaFree, 3=cudaLaunchKernel, 4=cudaMemcpy, 5=cudaStreamSync, 6=cudaDeviceSync, 7=cudaMemcpyAsync, 8=cudaMallocManaged. HOST ops: 1=sched_switch, 2=sched_wakeup, 3=mm_page_alloc, 4=oom_kill, 5=process_exec, 6=process_exit, 7=process_fork, 10=pod_restart, 11=pod_eviction, 12=pod_oom_kill. DRIVER ops: 1=cuLaunchKernel, 2=cuMemcpy, 3=cuMemcpyAsync, 4=cuCtxSynchronize, 5=cuMemAlloc, 6=cuMemAllocManaged. IO ops: 1=block_read, 2=block_write, 3=block_discard. TCP ops: 1=tcp_retransmit. NET ops: 1=net_send, 2=net_recv. arg0/arg1 per op: cudaMalloc/cudaMallocManaged arg0=size_bytes, cudaFree arg0=devPtr, cudaLaunchKernel arg0=kernel_func_ptr, cudaMemcpy/cudaMemcpyAsync arg0=bytes arg1=direction(0=H2H,1=H2D,2=D2H,3=D2D,4=default), cudaStreamSync arg0=stream_handle, mm_page_alloc arg0=page_order(size=4KB<<order), cuMemAlloc/cuMemAllocManaged arg0=size_bytes, block_read/block_write arg0=nr_sectors, net_send/net_recv arg0=bytes. sum_arg0 in event_aggregates = sum of arg0 across bucket (skipped for pointer-valued ops: cudaFree, cudaLaunchKernel, cuLaunchKernel). Timestamps: unix nanos. Duration: nanos (÷1e3=µs, ÷1e6=ms). Performance: events can have millions of rows. For large DBs, query event_aggregates (per-minute stats, always small) or stack_traces (deduplicated, always small) instead of scanning events. Use get_stacks tool for call stack analysis instead of manual SQL JOINs.

Prompts

Interactive templates invoked by user choice

Name	Description
`investigate`	Analyze a GPU trace database for performance problems. Works with saved/offline databases.

Resources

Contextual data attached and managed by the client

Name	Description
No resources

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ingero-io/ingero'

If you have feedback or need assistance with the MCP directory API, please join our Discord server