infra-advisor-mcp
Server Configuration
Describes the environment variables required to run the server.
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||
Capabilities
Features and capabilities supported by this server
| Capability | Details |
|---|---|
| tools | {
"listChanged": true
} |
| logging | {} |
| prompts | {
"listChanged": false
} |
| resources | {
"subscribe": false,
"listChanged": false
} |
| extensions | {
"io.modelcontextprotocol/ui": {}
} |
| experimental | {} |
Tools
Functions exposed to the LLM to take actions
| Name | Description |
|---|---|
| analyze_taskA | Parse a free-text task description into structured parameters. Use this first to understand what the user needs before calling other tools. Returns scale, use_case, domain, latency requirements, and estimated token volumes. |
| recommend_modelB | Recommend ranked open-source and closed-source models for a task. Pass parameters from analyze_task output for best results. Returns up to 8 ranked models with pricing, strengths, and caveats. |
| estimate_training_costA | Estimate GPU-hours, wall-clock time, cost, and sharding strategy for a training run. Covers pre-training, continual pre-training, full SFT, parameter-efficient fine-tuning (LoRA / QLoRA), and RL. Uses Chinchilla scaling laws for pre-training compute estimates. LoRA/QLoRA train only small adapters, so they need far less VRAM and fewer GPUs than full fine-tuning (QLoRA quantizes the base to 4-bit). Also returns a recommended parallelism strategy (DDP / FSDP-ZeRO-3 / tensor+pipeline parallel) based on model footprint, GPU VRAM, and interconnect. |
| estimate_inference_costA | Compare cloud API and self-hosted inference costs for a given token volume. Returns monthly cost for all major API providers and self-hosted options, with break-even analysis. Self-hosted sizing accounts for two levers:
|
| compare_cloud_vs_onpremA | Compare total cost of ownership: cloud vs on-prem over 1/3/5 year horizons. Returns cumulative costs, break-even month, and a recommendation. |
| estimate_maintenance_costA | Estimate all ongoing on-prem operational costs for a GPU cluster. Includes power, cooling, rack/colocation, networking, labor, depreciation, and recommended ML infra headcount. |
| generate_full_reportA | Generate a comprehensive markdown infrastructure report for any task. This is the main entry point. Runs all tools in sequence and returns a complete report covering: task analysis, model recommendations, inference costs, training costs (if relevant), cloud vs on-prem TCO, and maintenance costs. |
| generate_followup_answerA | Answer a specific follow-up question with calculator-backed data and an inline glossary. Use this instead of generate_full_report when the user asks a focused follow-up (e.g. "what's the training cost?", "cloud vs on-prem for this?", "which GPU?"). Returns a concise answer: direct response, data table, recommendation, and jargon glossary. |
| list_available_gpusA | List all GPU types in the database with specs and pricing. |
| get_data_freshness_infoA | Return last_updated timestamps for all data entries. Use this to check if pricing data is stale before relying on estimates. |
| reload_dataA | Reload all YAML data files from disk without restarting the server. Call this after running sync scripts to pick up updated pricing. |
| save_reportA | Save the final report (and any follow-ups) to .md and .html files. Call this when the user is satisfied with the report — this is the explicit finalize action. Pass all follow-up answers accumulated during the session. |
Prompts
Interactive templates invoked by user choice
| Name | Description |
|---|---|
No prompts | |
Resources
Contextual data attached and managed by the client
| Name | Description |
|---|---|
No resources | |
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/c3-yang-song/LLM-Infra-Advisor-MCP'
If you have feedback or need assistance with the MCP directory API, please join our Discord server