Schema | infra-advisor-mcp

infra-advisor-mcp

Overview Schema Related Servers Score Discussions

Server Configuration

Describes the environment variables required to run the server.

Name	Required	Description	Default
No arguments

Capabilities

Features and capabilities supported by this server

Capability	Details
`tools`	{ "listChanged": true }
`logging`	{}
`prompts`	{ "listChanged": false }
`resources`	{ "subscribe": false, "listChanged": false }
`extensions`	{ "io.modelcontextprotocol/ui": {} }
`experimental`	{}

Tools

Functions exposed to the LLM to take actions

Name	Description
analyze_taskA	Parse a free-text task description into structured parameters. Use this first to understand what the user needs before calling other tools. Returns scale, use_case, domain, latency requirements, and estimated token volumes.
recommend_modelB	Recommend ranked open-source and closed-source models for a task. Pass parameters from analyze_task output for best results. Returns up to 8 ranked models with pricing, strengths, and caveats.
estimate_training_costA	Estimate GPU-hours, wall-clock time, cost, and sharding strategy for a training run. Covers pre-training, continual pre-training, full SFT, parameter-efficient fine-tuning (LoRA / QLoRA), and RL. Uses Chinchilla scaling laws for pre-training compute estimates. LoRA/QLoRA train only small adapters, so they need far less VRAM and fewer GPUs than full fine-tuning (QLoRA quantizes the base to 4-bit). Also returns a recommended parallelism strategy (DDP / FSDP-ZeRO-3 / tensor+pipeline parallel) based on model footprint, GPU VRAM, and interconnect.
estimate_inference_costA	Compare cloud API and self-hosted inference costs for a given token volume. Returns monthly cost for all major API providers and self-hosted options, with break-even analysis. Self-hosted sizing accounts for two levers: quantization (fp8/int8/int4) shrinks model VRAM (so fewer GPUs per replica) and lifts throughput, at a small quality cost. the latency target sizes how many replicas are needed to serve the daily output volume at peak load — so an option is only "cheaper" if it can actually keep up. Each self-hosted option reports per-replica topology, replicas_needed, and total GPUs.
compare_cloud_vs_onpremA	Compare total cost of ownership: cloud vs on-prem over 1/3/5 year horizons. Returns cumulative costs, break-even month, and a recommendation.
estimate_maintenance_costA	Estimate all ongoing on-prem operational costs for a GPU cluster. Includes power, cooling, rack/colocation, networking, labor, depreciation, and recommended ML infra headcount.
generate_full_reportA	Generate a comprehensive markdown infrastructure report for any task. This is the main entry point. Runs all tools in sequence and returns a complete report covering: task analysis, model recommendations, inference costs, training costs (if relevant), cloud vs on-prem TCO, and maintenance costs.
generate_followup_answerA	Answer a specific follow-up question with calculator-backed data and an inline glossary. Use this instead of generate_full_report when the user asks a focused follow-up (e.g. "what's the training cost?", "cloud vs on-prem for this?", "which GPU?"). Returns a concise answer: direct response, data table, recommendation, and jargon glossary.
list_available_gpusA	List all GPU types in the database with specs and pricing.
get_data_freshness_infoA	Return last_updated timestamps for all data entries. Use this to check if pricing data is stale before relying on estimates.
reload_dataA	Reload all YAML data files from disk without restarting the server. Call this after running sync scripts to pick up updated pricing.
save_reportA	Save the final report (and any follow-ups) to .md and .html files. Call this when the user is satisfied with the report — this is the explicit finalize action. Pass all follow-up answers accumulated during the session.

Prompts

Interactive templates invoked by user choice

Name	Description
No prompts

Resources

Contextual data attached and managed by the client

Name	Description
No resources

Server Configuration
Capabilities
Tools
Prompts
Resources

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/c3-yang-song/LLM-Infra-Advisor-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server