Skip to main content
Glama
c3-yang-song

infra-advisor-mcp

by c3-yang-song

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault

No arguments

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": true
}
logging
{}
prompts
{
  "listChanged": false
}
resources
{
  "subscribe": false,
  "listChanged": false
}
extensions
{
  "io.modelcontextprotocol/ui": {}
}
experimental
{}

Tools

Functions exposed to the LLM to take actions

NameDescription
analyze_taskA

Parse a free-text task description into structured parameters.

Use this first to understand what the user needs before calling other tools. Returns scale, use_case, domain, latency requirements, and estimated token volumes.

recommend_modelB

Recommend ranked open-source and closed-source models for a task.

Pass parameters from analyze_task output for best results. Returns up to 8 ranked models with pricing, strengths, and caveats.

estimate_training_costA

Estimate GPU-hours, wall-clock time, cost, and sharding strategy for a training run.

Covers pre-training, continual pre-training, full SFT, parameter-efficient fine-tuning (LoRA / QLoRA), and RL. Uses Chinchilla scaling laws for pre-training compute estimates. LoRA/QLoRA train only small adapters, so they need far less VRAM and fewer GPUs than full fine-tuning (QLoRA quantizes the base to 4-bit). Also returns a recommended parallelism strategy (DDP / FSDP-ZeRO-3 / tensor+pipeline parallel) based on model footprint, GPU VRAM, and interconnect.

estimate_inference_costA

Compare cloud API and self-hosted inference costs for a given token volume.

Returns monthly cost for all major API providers and self-hosted options, with break-even analysis. Self-hosted sizing accounts for two levers:

  • quantization (fp8/int8/int4) shrinks model VRAM (so fewer GPUs per replica) and lifts throughput, at a small quality cost.

  • the latency target sizes how many replicas are needed to serve the daily output volume at peak load — so an option is only "cheaper" if it can actually keep up. Each self-hosted option reports per-replica topology, replicas_needed, and total GPUs.

compare_cloud_vs_onpremA

Compare total cost of ownership: cloud vs on-prem over 1/3/5 year horizons.

Returns cumulative costs, break-even month, and a recommendation.

estimate_maintenance_costA

Estimate all ongoing on-prem operational costs for a GPU cluster.

Includes power, cooling, rack/colocation, networking, labor, depreciation, and recommended ML infra headcount.

generate_full_reportA

Generate a comprehensive markdown infrastructure report for any task.

This is the main entry point. Runs all tools in sequence and returns a complete report covering: task analysis, model recommendations, inference costs, training costs (if relevant), cloud vs on-prem TCO, and maintenance costs.

generate_followup_answerA

Answer a specific follow-up question with calculator-backed data and an inline glossary.

Use this instead of generate_full_report when the user asks a focused follow-up (e.g. "what's the training cost?", "cloud vs on-prem for this?", "which GPU?"). Returns a concise answer: direct response, data table, recommendation, and jargon glossary.

list_available_gpusA

List all GPU types in the database with specs and pricing.

get_data_freshness_infoA

Return last_updated timestamps for all data entries.

Use this to check if pricing data is stale before relying on estimates.

reload_dataA

Reload all YAML data files from disk without restarting the server.

Call this after running sync scripts to pick up updated pricing.

save_reportA

Save the final report (and any follow-ups) to .md and .html files.

Call this when the user is satisfied with the report — this is the explicit finalize action. Pass all follow-up answers accumulated during the session.

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/c3-yang-song/LLM-Infra-Advisor-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server