Skip to main content
Glama

volt_recommend_route

Find optimal AI model providers by comparing cost, latency, and reliability to reduce compute expenses with personalized recommendations.

Instructions

Get the optimal provider recommendation for a model based on cost, latency, reliability, or balanced optimization. Shows savings vs your current cost.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
modelYesModel name or partial match to filter offerings (e.g. "llama-70b", "gpt-4o")
optimizeNoWhat to optimize for (default: balanced)balanced
current_cost_per_millionNoWhat you currently pay per million tokens (avg of input+output), for savings estimate
min_qualityNoMinimum acceptable quality score 0-1 (default: 0.7)
max_latency_msNoMaximum acceptable P95 latency in ms (default: 5000)
blocked_providersNoProvider IDs to exclude from recommendations

Implementation Reference

  • Handler for the `volt_recommend_route` tool. It retrieves offerings, filters by model, calculates optimal routing based on the provided profile, and formats the recommendation.
    export function handleRecommendRoute(input: RecommendRouteInput, feedCache: FeedCache) {
      const allOfferings = feedCache.getOfferings();
    
      if (allOfferings.length === 0) {
        return {
          content: [
            {
              type: 'text' as const,
              text: 'No pricing data available. The feed may still be loading — try again in a moment.',
            },
          ],
        };
      }
    
      // Filter to offerings matching the model query
      const query = input.model.toLowerCase();
      const modelOfferings = allOfferings.filter(
        (o) =>
          o.model.toLowerCase().includes(query) || o.modelShort.toLowerCase().includes(query),
      );
    
      if (modelOfferings.length === 0) {
        const available = [...new Set(allOfferings.map((o) => o.modelShort))].slice(0, 10).join(', ');
        return {
          content: [
            {
              type: 'text' as const,
              text: `No offerings found matching "${input.model}". Available models: ${available}.`,
            },
          ],
        };
      }
    
      // Auto-calculate comparison cost from most expensive offering if user didn't provide one
      let currentCost = input.current_cost_per_million;
      let comparisonContext: { autoCalculated: boolean; providerName: string; avgCost: number } = {
        autoCalculated: false,
        providerName: '',
        avgCost: 0,
      };
    
      if (currentCost == null && modelOfferings.length >= 2) {
        let maxAvg = 0;
        let maxProvider = '';
        for (const o of modelOfferings) {
          const avg = (o.priceInputPerMillion + o.priceOutputPerMillion) / 2;
          if (avg > maxAvg) {
            maxAvg = avg;
            maxProvider = o.providerName;
          }
        }
        currentCost = maxAvg;
        comparisonContext = { autoCalculated: true, providerName: maxProvider, avgCost: maxAvg };
      }
    
      const profile: RoutingProfile = {
        optimize: input.optimize as OptimizeTarget,
        minQuality: input.min_quality,
        maxLatencyMs: input.max_latency_ms,
        maxCostPerMillionTokens: Infinity,
        preferredProviders: [],
        blockedProviders: input.blocked_providers,
      };
    
      const rec = generateRecommendation(modelOfferings, profile, currentCost);
    
      if (!rec) {
        return {
          content: [
            {
              type: 'text' as const,
              text: `No eligible offerings for "${input.model}" with your constraints. Try lowering min_quality or raising max_latency_ms.`,
            },
          ],
        };
      }
    
      return {
        content: [
          {
            type: 'text' as const,
            text: formatRecommendation(rec, input.optimize, comparisonContext),
          },
        ],
      };
    }
  • Zod schema for validating the input to the `volt_recommend_route` tool.
    export const recommendRouteSchema = z.object({
      model: z
        .string()
        .describe('Model name or partial match to filter offerings (e.g. "llama-70b", "gpt-4o")'),
      optimize: z
        .enum(['cost', 'latency', 'reliability', 'balanced'])
        .default('balanced')
        .describe('What to optimize for (default: balanced)'),
      current_cost_per_million: z
        .number()
        .nullable()
        .default(null)
        .describe('What you currently pay per million tokens (avg of input+output), for savings estimate'),
      min_quality: z
        .number()
        .min(0)
        .max(1)
        .default(0.7)
        .describe('Minimum acceptable quality score 0-1 (default: 0.7)'),
      max_latency_ms: z
        .number()
        .int()
        .min(100)
        .default(5000)
        .describe('Maximum acceptable P95 latency in ms (default: 5000)'),
      blocked_providers: z
        .array(z.string())
        .default([])
        .describe('Provider IDs to exclude from recommendations'),
    });
  • Registration of the `volt_recommend_route` tool in the MCP server index.
    'volt_recommend_route',

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/newageflyfish-max/volthq'

If you have feedback or need assistance with the MCP directory API, please join our Discord server