Ingero
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
| INGERO_DB | No | Path to the Ingero SQLite database file. Defaults to ~/.ingero/ingero.db. | |
| INGERO_NO_UPDATE_NOTIFIER | No | Set to '1' to disable the background update check for new versions. |
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": true
} |
| logging | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| get_causal_chainsB | Analyze CUDA + host events and return causal chains with severity, root cause, and recommendations. AI-first: TSC-compressed by default. |
| get_checkB | Run system diagnostics: kernel version, BTF support, NVIDIA driver, CUDA libraries, running GPU processes |
| get_stacksA | Get resolved call stacks for CUDA/driver operations. Returns top stacks by frequency with symbol names, source files, and timing stats. One call answers 'what code path caused this operation?' For older DBs without resolved symbols, falls back to raw IPs (hex addresses). |
| get_test_reportB | Get the GPU integration test report (JSON). Generated by gpu-test.sh after a full test run. Includes per-test status, timing, and system info. |
| get_trace_statsA | Get CUDA and host operation statistics. Returns p50/p95/p99 for small DBs (≤500K events), count/avg/min/max from aggregates for large DBs. |
| run_demoB | Run a synthetic demo scenario and return the stats snapshot. No GPU or root needed. |
| run_sqlA | Execute read-only SQL on the Ingero database. For ad-hoc analysis the fixed tools can't do: temporal bucketing, threshold queries, per-PID breakdowns, throughput calculations. Timeout: 30s. Schema: events(id, timestamp INT nanos, pid, tid, source, op, duration INT nanos, gpu_id, arg0, arg1, ret_code, stack_hash, cgroup_id INT default 0), system_snapshots(id, timestamp, cpu_pct, mem_pct, mem_avail, swap_mb, load_avg), causal_chains(id TEXT, detected_at, severity, summary, root_cause, explanation, recommendations JSON, cuda_op, cuda_p99_us, cuda_p50_us, tail_ratio, timeline JSON), sessions(id, started_at, stopped_at, gpu_model, gpu_driver, cpu_model, cpu_cores, mem_total, kernel, os_release, cuda_ver, python_ver, ingero_ver, pid_filter, flags), sources(id, name, description), ops(source_id, op_id, name, description), process_names(pid, name, seen_at), event_aggregates(bucket, source, op, pid, count, stored, sum_dur, min_dur, max_dur, sum_arg0), stack_traces(hash, ips TEXT JSON, frames TEXT JSON resolved symbols), cgroup_metadata(cgroup_id PK, container_id TEXT, cgroup_path TEXT), cgroup_schedstat(cgroup_id PK, p99_off_cpu_ns, total_off_cpu_ns, event_count, window_start, window_end), schema_info(key, value). JOINs: events.source=sources.id, events.(source,op)=ops.(source_id,op_id), events.stack_hash=stack_traces.hash, events.cgroup_id=cgroup_metadata.cgroup_id (K8s container context). Sources: 1=CUDA, 3=HOST, 4=DRIVER, 5=IO, 6=TCP, 7=NET. CUDA ops: 1=cudaMalloc, 2=cudaFree, 3=cudaLaunchKernel, 4=cudaMemcpy, 5=cudaStreamSync, 6=cudaDeviceSync, 7=cudaMemcpyAsync, 8=cudaMallocManaged. HOST ops: 1=sched_switch, 2=sched_wakeup, 3=mm_page_alloc, 4=oom_kill, 5=process_exec, 6=process_exit, 7=process_fork, 10=pod_restart, 11=pod_eviction, 12=pod_oom_kill. DRIVER ops: 1=cuLaunchKernel, 2=cuMemcpy, 3=cuMemcpyAsync, 4=cuCtxSynchronize, 5=cuMemAlloc, 6=cuMemAllocManaged. IO ops: 1=block_read, 2=block_write, 3=block_discard. TCP ops: 1=tcp_retransmit. NET ops: 1=net_send, 2=net_recv. arg0/arg1 per op: cudaMalloc/cudaMallocManaged arg0=size_bytes, cudaFree arg0=devPtr, cudaLaunchKernel arg0=kernel_func_ptr, cudaMemcpy/cudaMemcpyAsync arg0=bytes arg1=direction(0=H2H,1=H2D,2=D2H,3=D2D,4=default), cudaStreamSync arg0=stream_handle, mm_page_alloc arg0=page_order(size=4KB<<order), cuMemAlloc/cuMemAllocManaged arg0=size_bytes, block_read/block_write arg0=nr_sectors, net_send/net_recv arg0=bytes. sum_arg0 in event_aggregates = sum of arg0 across bucket (skipped for pointer-valued ops: cudaFree, cudaLaunchKernel, cuLaunchKernel). Timestamps: unix nanos. Duration: nanos (÷1e3=µs, ÷1e6=ms). Performance: events can have millions of rows. For large DBs, query event_aggregates (per-minute stats, always small) or stack_traces (deduplicated, always small) instead of scanning events. Use get_stacks tool for call stack analysis instead of manual SQL JOINs. |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ingero-io/ingero'
If you have feedback or need assistance with the MCP directory API, please join our Discord server