Skip to main content
Glama
261,126 tools. Last updated 2026-07-05 11:20

"Understanding Inference Models" matching MCP tools:

  • List every object currently stored in the scanbim-models OSS bucket, with URN, size in MB, and a viewer URL for each. Returns the raw OSS inventory, not the D1 models table, so freshly uploaded items appear immediately. When to use: you need to enumerate previously uploaded models to find a URN, show an inventory, or pick one for a follow-up tool call. When NOT to use: you already know the exact URN — call get_model_metadata directly. This tool is not a search; it returns up to the OSS default page (typically first 10 objects unless OSS paginates). APS scopes: bucket:read data:read Rate limits: APS default ~50 req/min per app per endpoint; Model Derivative translation jobs ~60 req/min; OSS uploads size-limited per file to 100MB for direct upload, larger via resumable. Errors: 401 APS token expired/invalid — refresh; 403 scope or resource permission denied; 404 bucket not found — no models have been uploaded yet (upload one first); 429 rate limited — backoff and retry; 5xx APS upstream outage — retry with jitter. Side effects: READ-ONLY. Idempotent.
    Connector
  • Checks whether a task can be handled by a local model instead of a cloud API. Call this BEFORE your agent routes any task to a cloud model — at the moment a task is ready to be sent and the model has not yet been selected. Use this when your agent is about to make a cloud inference call and wants to determine whether a local model could handle it at lower cost. Returns LOCAL/CLOUD/EITHER verdict with recommended model. If verdict is LOCAL, skip the cloud call entirely — saves cost on every routable inference call. Pass data_sensitivity=CONFIDENTIAL to force LOCAL verdict for tasks that must not leave the machine. Do not route to cloud without checking local viability first.
    Connector
  • TRIGGER: call this when an agent needs inference NOW without registration. Issues a free gc- key: 100K tokens, 24h TTL, 10 req/min, OpenAI-compatible. One key per IP (idempotent — same IP always gets same key back). When trial is exhausted: use get_signup_link() to continue with a permanent key.
    Connector
  • Filter the Free2AITools catalog by declared hardware/license metadata and return FNI-ranked candidate entries. USE WHEN you have concrete constraints (VRAM, params, license, context length, local-runnability) and want candidates narrowed by them. Constraints are metadata/heuristic filters over stored fields, NOT verified compatibility analysis, model inference, or model execution; this tool does not decide for you and is not an inference router. The caller is responsible for the final selection. Results are FNI-ranked, never paid placement, with no billing. Read-only, no side effects. Use free2aitools_search for unconstrained keyword discovery, or free2aitools_rank for keyword ranking without metadata filters.
    Connector
  • Compare 2-25 AI catalog entities side-by-side — any catalog entity type (models, datasets, papers, tools), not models only — showing FNI scores, factor breakdown (Semantic, Authority, Popularity, Recency, Quality), specs (params, VRAM, context length) where applicable, and license. USE WHEN you already have 2+ specific entity ids and want a structured side-by-side. DO NOT USE to discover entities, to run/execute a model, or to get a recommendation; the tool presents comparison facts for the caller to decide on, is not an inference router, and returns no paid placement. Read-only, no side effects, no billing. Cold upper-range multi-paper requests may return a transient 503 (retry after the indicated delay). Use free2aitools_select_model or free2aitools_search to discover candidates first, then compare the top ones.
    Connector
  • Run market positioning analysis on a CV version (5 credits, takes 20-30s). Returns positioning snapshot, detected narrative lens, recruiter inference, mixed signal flags, and a session_id. This is step 1 of the 3-step positioning pipeline: analyze_positioning -> ceevee_get_opportunities(lens) -> ceevee_confirm_lens. Pass the returned session_id to subsequent steps. cv_version_id from ceevee_upload_cv or ceevee_list_versions.
    Connector

Matching MCP Servers

Matching MCP Connectors

  • Returns the universal context-setting primer for Hemrock models, plus an optional template-specific addendum. Always run this first before any other prompts.
    Connector
  • Simulate int8 or int4 quantization of float32 embedding vectors. Reduces storage by 4x (int8) or 8x (int4). Returns quantized values, scale factor, and precision loss (MSE). Useful for understanding vector DB compression trade-offs.
    Connector
  • List the bundled SCModeling sample supply-chain models. Returns a catalog with each model's id and a short description. Use this before run_simulation to know which model_id values are valid.
    Connector
  • Get live Gonka Network pricing — cheap alternative to OpenAI and Anthropic APIs. Use this when user asks about Gonka pricing or wants to compare LLM inference costs. Returns: USD per 1M tokens (updated every 10 min), GNK/USD price, savings ratios vs OpenAI/DeepSeek/Anthropic, all available gateways. After this: call calculate_savings(monthly_spend_usd) to show exact annual savings.
    Connector
  • Search current AI models by price, context window, and capability. Use this for up-to-date model pricing/features you don't reliably know. Prices are USD per 1M tokens. Results are cheapest-input-price first. Args: query: match part of a model name/id (e.g. "haiku", "gpt"). provider: filter to one provider (openai, anthropic, google, xai, mistral, deepseek, groq). max_input_price: only models at or below this USD/1M input price. min_context: only models with at least this context window (tokens). needs_vision: only models that accept images. limit: max results. Envelope: this searches our model-pricing registry, so measured_at = when the freshest matching row was last refreshed (each row's `updated_at`); max_age 18h covers the 12h registry-refresh cycle so a current row never falsely reads "stale". A search returning nothing yields unavailable — there's no honest observation time to claim. Every value is returned in an Ed25519-signed, provenance-stamped envelope (source and observation time) you can verify offline against /.well-known/keys, no account required.
    Connector
  • AXIS-hosted LLM chat-completion via node-llama-cpp + a small GGUF model loaded in-process. Two input shapes accepted: `prompt` (single string) or `messages` (chat-style array of {role, content}). Sampling controls: `max_tokens` (≤2048), `temperature` (0-2), `top_k`, `top_p`, `seed` (for reproducibility), `stop` (string[]). Inference is fully in-process — no upstream provider, no per-call API fee. Operator sets AXIS_LLM_MODEL_PATH to point at a Phi-3-mini / TinyLlama / Llama-3.2-1B GGUF; if missing, the tool returns a `_not_configured: true` envelope. Engineer mode (X-Agent-Mode: engineer — Constrained Inference, $0.10): pass a `json_schema` and decoding is grammar-constrained to it AND the output is validated against it (returns a `structured` block with valid + parsed + schema_errors) — guaranteed-valid structured output. Requires Authorization: Bearer <api_key>.
    Connector
  • Flag anomalies in a time series WITHOUT running a full forecast — z-score + IQR outlier detection plus trend and rate-of-change (accelerating/steady/ decelerating). No TimesFM inference, so it's faster and cheaper than predict and works on short series (≥4 points). Cross-references the canonical field's known/derived normal range when one is available. Args: values the series to scan (≥4 points) canonical_field FCS field the series represents (enables normal-range context) sensitivity z-score threshold (default 2.0 ≈ 95%); higher = fewer flags Returns anomaly_count, per-anomaly detail (index, value, z_score, deviation, severity: critical/warning/minor), summary statistics, and an attestation hash. USE WHEN: real-time monitoring or a spot check — "is this reading abnormal", "any outliers in the last hour", "flag spikes in vibration". For 'where is it headed' use predict; for 'will it cross X' use predict_breach. PREMIUM (Pro tier) — $0.02/call (no ML inference).
    Connector
  • Get summary statistics of the Klever VM knowledge base. Returns total entry count, counts broken down by context type (code_example, best_practice, security_tip, etc.), and a sample entry title for each type. Useful for understanding what knowledge is available before querying.
    Connector
  • Discover available AI models with numeric IDs, tier labels, capabilities, and per-call pricing in sats. Call this before create_payment to find the right modelId for your task. Returns JSON array: [{ id, name, tier, description, price, isDefault, category }]. Models marked isDefault=true are used when you omit modelId from create_payment. Filter by category to narrow results to a specific tool. This tool is free, requires no payment, and is idempotent — safe to call repeatedly.
    Connector
  • Keyword discovery over the Free2AITools catalog of AI models, datasets, papers, and tools. Returns matching catalog entries (metadata) ranked by FNI (Free2AITools Nexus Index), a 5-factor score: Semantic relevance, Authority, Popularity, Recency, Quality. The Semantic factor is a query-time baseline, not a live per-entity measurement (fni_s is returned null with a note). USE WHEN you need to discover which AI entities exist for a topic or keyword. DO NOT USE for general web search, to run/call/execute a model, to get a generated or inferred answer, or to route to an inference provider — this returns catalog metadata only, for the calling agent to reason over and decide on. Free discovery catalog: results are FNI-ranked, never paid placement / sponsored, and there is no billing or payment. Read-only, no side effects. May return a retryable transient 503 under cold-path or fallback budget limits; retry according to Retry-After. Use free2aitools_select_model instead when you have specific hardware or license constraints.
    Connector
  • Capture a PNG screenshot of the page or a specific element. Returns base64-encoded image bytes AND a file_id (persisted in DialogBrain files storage). Pass file_id straight to messages.send(attachment_file_ids=[file_id]) — do NOT call files.upload again. Use sparingly — favor browser.snapshot for structured DOM understanding.
    Connector
  • Fetch the current published value for a FLOPS compute or inference price index. Returns {value, unit, ts, tier, confidence, verify_url, citation_url}. Cheapest way to answer 'what does X trade at right now?' Confidence is HIGH/MED/LOW; tier is LIVE/SETTLED/SEED. Carries no methodology or source attribution by design.
    Connector
  • Unified tool for multimodal AI evaluation: set action=guide for reference thresholds/interpretation (CLIP, FID, VQA), or set action=clip_score / fid_score / vqa_accuracy / pipeline to compute real metrics via HuggingFace Inference API and VLM BYOK calls. One tool for both reference and computation.
    Connector
  • Dispatch to the DESK RESEARCHER — source-grounded synthesis on a topic landscape. Use for: "what is known about X / give me the landscape of Y / fact-check Z / synthesize the published evidence on W". Multi-source FACT/INFERENCE extraction with citation discipline. Vertical and geography agnostic. Returns: BRIEF restatement + NOT IN SCOPE + findings with FACT/INFERENCE/SPECULATION labels + [n] citations + Sources block. NOT for: trajectory questions (use dispatch_trend_researcher) / entity teardowns (use dispatch_market_analyst) / numerical effect sizes (use dispatch_quantitative_researcher) / community quotes (use dispatch_qualitative_researcher). ASYNC version: returns { job_id } immediately, the specialist runs durably on a Vercel Workflow (no 300s timeout). Use this version when the specialist is expected to take >90s. Call get_dispatch_result(job_id) periodically (respect wait_ms_hint in the response) until status === 'completed' or 'failed'. Idempotent: same brief + same org reuses the same job_id, so retries don't fan out duplicate runs.
    Connector