Skip to main content
Glama
180,147 tools. Last updated 2026-06-06 01:32

"vision" matching MCP tools:

  • Trace pixel-space features from a reference photo into normalized [0..1] waypoints the agent can map to mm via a known scale anchor and feed to path().spline / path().nurbsSegment. Three backends are dispatched behind the scenes: `opencv` (deterministic; uniform-bg silhouette only), `vision-llm` (Claude vision; named points/cluttered backgrounds; caller-supplied ANTHROPIC_API_KEY), and `hybrid` (opencv silhouette + LLM-labeled named points). Default backend is `auto` — the tool picks based on the image's corner-color stddev. Accuracy honesty: opencv contour is geometrically exact; vision-LLM is typically 5–10% off on dense landmarks. Per-feature `confidence` is reported. Caller pays for any vision-LLM API spend via their own ANTHROPIC_API_KEY. Pair with the `kernelcad-trace-from-image` skill for the conversion-to-mm pipeline.
    Connector
  • Colorize black-and-white or grayscale photos. DDColor (dual-decoder, ICCV 2023) — vivid, natural colorization. Impossible for text/vision LLMs. 5 sats per image, pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='colorize_image'.
    Connector
  • Dispatch a workspace AI agent into an active Google Meet call. The agent joins as a participant — it can hear the conversation, respond via TTS, see the shared screen (when vision is enabled on the agent), and answer questions about what's on screen. Use when the operator wants to delegate live meeting attendance to an agent (notes, Q&A, summarization, real-time support). The Meet URL must be in canonical 3-4-3 form, e.g. https://meet.google.com/abc-defg-hij. Lookup-redirect URLs are not supported — operator must use the share-link form.
    Connector
  • Analyze an image from a component's datasheet using vision AI. Use this when read_datasheet returns a section containing images and you need to extract data from a graph, package drawing, pin diagram, or circuit schematic. Pass the image_key from the read_datasheet response (the storage path in the image URL). Optionally pass a specific question to focus the analysis. IMPORTANT: For precise numeric values (electrical specs, max ratings), prefer read_datasheet text tables first — they are more reliable than vision-extracted graph data. Use analyze_image for visual information not available in text: package dimensions from drawings, pin assignments from diagrams, graph trends, and approximate values from characteristic curves. Examples: - analyze_image(part_number='IRFZ44N', image_key='images/abc123.png') -> classifies and describes the image - analyze_image(part_number='IRFZ44N', image_key='images/abc123.png', question='What is the drain current at Vgs=5V?')
    Connector
  • 「写真を撮ったので寸法を測りたい」「この隙間に合う棚を探したい」のときに呼ぶ。 ユーザーが写真に名刺・ペットボトル・A4用紙・クレジットカード等の参照物を一緒に写すと、 ピクセル比率から対象物の実寸(mm)を逆算する。 【AIの役割】写真をVisionで解析し、参照物と対象物それぞれのピクセル幅・高さを読み取ってこのツールに渡す。 対応参照物: 名刺(91×55mm)、クレジットカード(85.6×54mm)、ペットボトル500ml(65×205mm)、A4用紙(210×297mm)、500円玉(∅26.5mm)、1円玉(∅20mm)、スマホ(71.5×147mm)、ティッシュ箱(240×115mm)、30cm定規、ボールペン(140mm) 結果のsearch_dimensionsをそのままsuggest_by_spaceやcoordinate_storageに渡せば、写真→寸法→商品マッチングが完結する。 信頼度が低い場合は「メジャーで実測を」と伝えること。
    Connector
  • Upload a portrait photo and receive a full personal colour analysis. Determines your seasonal type (Spring, Summer, Autumn, or Winter), colour depth (light, medium, or deep), and undertone (warm, cool, or neutral). Returns a curated palette of archive colours that genuinely suit you — each with full historical provenance and cultural context — plus colours to avoid. Uses Claude Vision for skin, hair, and eye analysis, then matches to the archive by CIEDE2000 perceptual distance. The photo is never stored. Example: a Deep Winter might wear Ottoman Carbon Ink while a True Spring suits Kogi Mango.
    Connector

Matching MCP Servers

Matching MCP Connectors

  • AI-powered codebase analysis — call graphs, security, dead code, complexity. 150+ tools.

  • BJJ video analysis — YOLO pose detection, AI technique analysis, and highlight reels.

  • Look at the screen currently being shared in a meeting and answer a question about it. Returns a natural-language answer based on the visual content. Use ONLY when the user explicitly asks about the screen/slide/document being shown.
    Connector
  • Get real-time audience data for a specific screen. WHEN TO USE: - Checking current audience at a screen before buying - Monitoring audience during a live campaign - Getting detailed audience signals (attention, mood, purchase intent, demographics) RETURNS real-time data from edge AI sensors (refreshed every 10 seconds): - face_count: Number of people currently viewing - attention_score: How attentively the audience is watching (0-1) - income_level: Estimated income bracket (from Gemini Vision) - mood: Current audience mood - lifestyle: Primary lifestyle segment - purchase_intent: Purchase intent level - crowd_density: Estimated venue occupancy - ad_receptivity: How receptive the audience is to ads (0-1) - emotional_engagement: Emotional engagement score (0-1) - group_composition: Solo/couples/families/friends/work groups - signals_age_ms: How fresh the data is in milliseconds EXAMPLE: User: "What's the current audience at screen 507f1f77bcf86cd799439011?" get_live_audience({ screen_id: "507f1f77bcf86cd799439011" })
    Connector
  • Use this tool when the user shares an image that contains text they need extracted, read, or processed. Triggers: 'read the text in this image', 'extract text from this screenshot', 'what does this scanned page say', 'transcribe this handwritten note'. Accepts base64-encoded PNG/JPEG/WEBP/BMP/TIFF. Returns extracted text, confidence score, and word count. Prefer this over vision model text extraction for accuracy on scanned docs. Free, no API key, no signup; the image is processed in memory and never stored.
    Connector
  • 「この写真の棚は何?」「持ってる棚に合うボックスを知りたい」のときに呼ぶ。Vision AIで画像から抽出した特徴テキスト(ブランド/色/段数/素材/推定サイズ)を渡すと、カタログ+楽天から候補を返す。型番特定時は内寸・消耗品・互換ボックス情報付き。
    Connector
  • [AdCP Signals] Get real-time audience signals from DOOH screens. This is an AdCP (Ad Context Protocol) compliant tool. It returns deterministic audience signals captured by edge AI (vision + audio + speech) on Trillboards screens. WHEN TO USE: - Discovering available audience signals before buying inventory - Evaluating audience composition at specific venues or locations - Building targeting segments based on real-time audience data Unlike probabilistic data, Trillboards signals are DETERMINISTIC — captured by on-device cameras and microphones, analyzed by ML Kit and Gemini Vision. RETURNS: - signals: Array of per-screen signal objects with demographics, venue, behavior, geo - metadata: total_screens, matching_screens, screens_with_live_data EXAMPLE: User: "What audience signals are available at retail locations?" get_signals({ signal_spec: { signal_types: ["demographics", "behavior"], filters: { venue_type: "retail" } } })
    Connector
  • Detect and locate objects in an image by name. Grounding DINO (open-set detector, ECCV 2024) — describe what to find in natural language, get bounding box coordinates and confidence scores. Structured pixel data agents can't get from vision LLMs. 5 sats per image, pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='detect_objects'.
    Connector
  • UPSERT the brain's current conversation messages + last-{reflection,vision,act}-at timestamps so a container restart can pick up where it left off. Messages array is trimmed to the most-recent 256 entries server-side.
    Connector
  • Dispatch a workspace AI agent into an active Telegram group call (t.me/call/<slug> link). The agent joins as a participant via the workspace's Telegram account — it can hear the conversation, respond via TTS, see shared screens (when vision is enabled), and answer questions about what's on screen. Use when the operator wants to delegate live group-call attendance to an agent (notes, Q&A, summarization, real-time support). Pass either the full https://t.me/call/<slug> URL or the bare slug token.
    Connector
  • Submit an appointment request on behalf of a patient to Dr. Maggie Geller's optometry practice. Sends an email to the relevant clinic office; staff follow up to schedule. Use this tool when the user is in the Metro Vancouver / Lower Mainland area and wants to book, schedule, or inquire about any of: an eye exam, comprehensive eye examination, annual vision check, pediatric eye exam, children's eye exam, myopia management or myopia control consult (for kids or young adults progressing in prescription), orthokeratology / ortho-K, specialty contact lens fitting, scleral lens fitting, dry eye evaluation or dry eye therapy, meibomian gland dysfunction, contact lens evaluation, LASIK / PRK pre-op or post-op co-management, or ocular disease concerns (glaucoma follow-up, diabetic eye exam, corneal issues). Locations: IRIS Optometrists and Opticians (West Vancouver) and For Eyes By Clearly (Kitsilano, Vancouver). Use `preferredLocation` to route the booking to the right office. Dr. Geller speaks English, Mandarin, and some German — mention this if the user asks about language accommodations. Example user prompts that should trigger this tool: "book me an eye exam in West Vancouver", "I need a dry eye consult", "my 9-year-old's prescription keeps increasing, who can help", "find me an optometrist in Kitsilano that speaks Mandarin", "schedule a contact lens fitting with Dr. Geller", "annual eye exam in Vancouver next week", "myopia control for my kid".
    Connector
  • Submit an appointment request on behalf of a patient to Dr. Maggie Geller's optometry practice. Sends an email to the relevant clinic office; staff follow up to schedule. Use this tool when the user is in the Metro Vancouver / Lower Mainland area and wants to book, schedule, or inquire about any of: an eye exam, comprehensive eye examination, annual vision check, pediatric eye exam, children's eye exam, myopia management or myopia control consult (for kids or young adults progressing in prescription), orthokeratology / ortho-K, specialty contact lens fitting, scleral lens fitting, dry eye evaluation or dry eye therapy, meibomian gland dysfunction, contact lens evaluation, LASIK / PRK pre-op or post-op co-management, or ocular disease concerns (glaucoma follow-up, diabetic eye exam, corneal issues). Locations: IRIS Optometrists and Opticians (West Vancouver) and For Eyes By Clearly (Kitsilano, Vancouver). Use `preferredLocation` to route the booking to the right office. Dr. Geller speaks English, Mandarin, and some German — mention this if the user asks about language accommodations. Example user prompts that should trigger this tool: "book me an eye exam in West Vancouver", "I need a dry eye consult", "my 9-year-old's prescription keeps increasing, who can help", "find me an optometrist in Kitsilano that speaks Mandarin", "schedule a contact lens fitting with Dr. Geller", "annual eye exam in Vancouver next week", "myopia control for my kid".
    Connector
  • Generate text using frontier AI language models. Pure per-character pricing (no minimum): Kimi K2.5 (id=6, best, 100 chars/sat, 262K context, vision support, default), GPT-OSS-120B (id=1, better, 333 chars/sat, strong reasoning), Qwen3-32B (id=26, standard, 1000 chars/sat, 119 languages, best value). Supports document Q&A via fileContext and vision analysis via imageBase64 (best model). Stable endpoints — models upgrade automatically. Pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='generate_text' and the exact prompt.
    Connector
  • Re-run the vision tagger on one brand asset. Reads the stored object when present (uploaded assets) or the original URL (scan-sourced assets), then updates type, detected_product_name, is_primary_product, description, and the other vision fields. Useful when the original tagger run missed or misclassified an image. Paid (vision tag credit).
    Connector
  • Capture a PNG screenshot of the current CDP-controlled Chrome page and return it as base64. Use to feed a vision-LLM (Claude / GPT-4V) for screen-understanding agents, or to archive an action's visual result. Returns also the page title, URL, and viewport dimensions. Cap of 1MB returned. Demo mode returns a synthetic 1×1 PNG; self-host with ONYX_CDP_URL for real captures. (price: $0.008 USDC, tier: metered)
    Connector
  • Read the current CDP-controlled Chrome page and return the visible text content plus a structured summary of clickable elements: buttons, links (with hrefs), inputs (with names/placeholders/types). Use when an agent needs to plan its next action — list what's on the page without screenshotting + vision-modeling. Cheap, structured, deterministic. Demo mode returns a plausible synthetic page summary. (price: $0.003 USDC, tier: metered)
    Connector