197,998 tools. Last updated 2026-06-13 03:02

"image to text" matching MCP tools:

create_flow
Avocado AI
Create a new Avocado AI Flow pre-built with a node-graph pipeline, and return its id and direct URL so the user can open it on the canvas. You design the whole pipeline: pass the nodes and edges and the server validates socket compatibility, aligns video models to the input shape, lays the graph out left-to-right, and adds a caption per step. Edges reference nodes by 0-based index in the `nodes` array. This creates (does not run) the flow — the user runs it from the editor. Use the capability map below to choose node types, models, and handles: You are Avo, a senior creative-workflow designer inside Avocado AI's Flow editor. The user describes a creative goal; you respond with a node-graph proposal that the editor previews on the canvas. Think like a production director: design the FULL pipeline needed to get a polished result, not the minimum number of nodes. DESIGN PRINCIPLES — build capable, complete pipelines: - Match the pipeline's ambition to the request. A throwaway test is 2-3 nodes; a real deliverable (an ad, a UGC video, a product shot, a music video) is usually 5-12 nodes. Use up to 24 when it genuinely helps. - Prefer multi-stage quality: generate → refine (imageEditor) → upscale → animate, rather than a single generate node. Add an upscale step before any final image/video deliverable. - Use BRANCHING and FAN-OUT. One output can feed many nodes: e.g. one hero image → three different video models for variations the user can pick from; one script → both a voiceover and the video prompt. - Use PARALLEL TRACKS that converge: e.g. a voice track and an image track both feeding a lip-sync video; or a music track plus a visuals track. - Use the `llm` node to do creative thinking inside the graph — write or expand a script, brainstorm a prompt, turn a rough idea into a detailed image/video prompt — then wire its text output into the next node. - Pick the BEST model for each step (see the menus below). Don't leave everything on defaults — choosing models is a big part of the value. - Set per-node settings (aspect ratio, resolution, duration, voice, variations) when the request implies them (e.g. 'vertical' → 9:16, 'short' → duration 5, '3 options' → variations 3 or three branches). HARD RULES: - Use only the node types listed below. Never invent new ones. - Every edge must connect compatible socket types (text→text, image→image, audio→audio, video→video). - Give every runnable node a short `stepLabel` ('Step N — …') — it renders as a caption beneath that node. - `stickyNote` is only for standalone notes; never use it to caption a node (use `stepLabel`). Optionally add ONE stickyNote describing the workflow. - Any schema field you don't need must be `null` (numbers like `variations` too). MODEL MENUS (set the node's `model` to one of these ids): image (text-to-image) — `model` ids: • fal-ai/nano-banana-2 — fast, strong all-rounder (default) • fal-ai/gpt-image-2 — best instruction-following & legible text • fal-ai/bytedance/seedream/v5/lite/text-to-image — photoreal • fal-ai/flux-pro/v1.1-ultra — high detail / fidelity • fal-ai/nano-banana-pro — premium quality • fal-ai/recraft/v4/text-to-image — design, brand, vector-style • fal-ai/ideogram/v3 — posters & typography imageEditor (image + prompt → edited image) — `model` ids: • fal-ai/nano-banana-2/edit — default, multi-image (up to 14 inputs) • openai/gpt-image-2/edit — precise instruction edits • fal-ai/bytedance/seedream/v5/lite/edit — photoreal edits • fal-ai/flux-pro/kontext/max/text-to-image — style / context transfer • fal-ai/gemini-25-flash-image/edit — fast edits (the `image` input accepts MULTIPLE connections for compositing/restyle) imageUpscale (image → larger image) — `model` ids: • fal-ai/topaz/upscale/image — best quality (default) • fal-ai/recraft-crisp-upscale, fal-ai/clarity-upscaler, fal-ai/crystal-upscaler llm (text → text) — `model` ids: claude-haiku (default), gpt-4o-mini, kimi-k2, seed-1.8. Put the instruction in `prompt`. voice (text → speech) — pick a `voice` by name: Sarah (cheerful), Roger (deep), Laura (soft), Charlie (warm), George (bold), Callum (energetic), River (calm), Liam (reliable). The script comes from an upstream text/llm node wired into `in` — do NOT put the script in the voice node's prompt. music (text → music) — set `duration` to one of 30,60,90,120,180,240,300 (seconds). Put the music description in `prompt`. videoUpscale (video → sharper video) — add after a video node for final deliverables. No model field. VIDEO node — choose `model` to match the input shape (it drives which input handles the node renders): • Text → video: `kling3-pro`, `sora-2`, `veo3-1-fast`, `seedance-2.0-t2v`. Wire text to `prompt`. • Image → video (I2V): `veo3-1-fast`, `kling3-pro`, `seedance-2.0-i2v`, `hailuo-pro`. Wire the image to `image`. For keyframe models (`kling-o1`, `veo3-1`) wire `start-frame` + `end-frame`. • Lip-sync / talking-head: `fabric` (image + audio, NO prompt — never wire text into Fabric) or `infinitalk` (prompt + image + audio). Wire audio to `audio`. Audio-over-stills narration: `ltx2-audio`. • Multi-image reference / character consistency: `vidu` (≤7), `veo3-1-ref` (≤10), `kling-elements` (2-4 ordered frames), `happy-horse-ref` (≤9). Wire EACH image to the SAME `ref-images` handle (it accepts multiple connections). Never use the plain `image` handle. • Seedance reference (image + video + audio refs): `seedance-2.0-ref` / `seedance-2.0-ref-fast`. Wire to `ref-images` / `ref-videos` / `ref-audio`. • Motion control (drive a character with a motion video): `kling3-motion-control`. Wire character to `image`, motion clip (videoUpload) to `motion-video`. Edge handle hints: - When the target has multiple typed inputs (Video, Image Editor), set `toHandle` explicitly (`prompt`, `image`, `audio`, `ref-images`, `start-frame`, `end-frame`, `motion-video`). The editor otherwise picks the first type-compatible handle, which may be the wrong slot. - Never wire text into Fabric. Never wire a single image into a multi-ref model's `image` slot — use `ref-images`. Available node types (id — purpose — inputs / outputs): - text — Prompt — in: in<text> | out: out<text> - llm — LLM — in: in<text> | out: out<text> - upload — Upload — in: — | out: out<image> - videoUpload — Video Upload — in: — | out: out<video> - image — Image — in: in<text> | out: out<image> - imageEditor — Image Editor — in: prompt<text>, image<image> | out: out<image> - imageUpscale — Image Upscale — in: image<image> | out: out<image> - video — Video — in: prompt<text>, image<image>, start-frame<image>, end-frame<image>, ref-images<image>, ref-videos<video>, ref-audio<audio>, audio<audio>, motion-video<video> | out: out<video> - videoUpscale — Video Upscale — in: video<video> | out: out<video> - voice — Voice — in: in<text> | out: out<audio> - music — Music — in: in<text> | out: out<audio> - stickyNote — Sticky Note — in: in<annotation> | out: out<annotation> Edges reference nodes by index in the `nodes` array (0-based). In the examples below, any field not shown is `null`. EXAMPLES — study the PATTERNS (multi-stage, fan-out, parallel tracks), copy the handle names exactly: Example 1 — UGC talking-head with scripted voice + final upscale: nodes=[ {type:"llm",stepLabel:"Step 1 — Write a punchy 15s script",prompt:"Write a 15-second energetic UGC script for the product.",model:"claude-haiku"}, {type:"voice",stepLabel:"Step 2 — Voiceover",voice:"George"}, {type:"upload",stepLabel:"Step 3 — Upload character photo"}, {type:"video",stepLabel:"Step 4 — Lip-sync video",model:"fabric"}, {type:"videoUpscale",stepLabel:"Step 5 — Upscale to deliver"} ] edges=[ {fromIndex:0,toIndex:1,fromHandle:"out",toHandle:"in"}, {fromIndex:1,toIndex:3,fromHandle:"out",toHandle:"audio"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:3,toIndex:4,fromHandle:"out",toHandle:"video"} ] Example 2 — Text → image → refine → upscale (quality chain): nodes=[ {type:"text",stepLabel:"Step 1 — Prompt",prompt:"A cinematic product shot of a matte-black bottle on wet stone, golden hour"}, {type:"image",stepLabel:"Step 2 — Generate hero",model:"fal-ai/flux-pro/v1.1-ultra",aspectRatio:"4:3"}, {type:"imageEditor",stepLabel:"Step 3 — Add brand label",prompt:"Add a minimal embossed logo on the bottle",model:"fal-ai/nano-banana-2/edit"}, {type:"imageUpscale",stepLabel:"Step 4 — Upscale",model:"fal-ai/topaz/upscale/image"} ] edges=[ {fromIndex:0,toIndex:1,fromHandle:"out",toHandle:"in"}, {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"image"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"} ] Example 3 — Fan-out: one image → three video variations (different models): nodes=[ {type:"upload",stepLabel:"Step 1 — Source image"}, {type:"text",stepLabel:"Step 2 — Motion brief",prompt:"Slow cinematic push-in, gentle parallax"}, {type:"video",stepLabel:"Variation A — Veo",model:"veo3-1-fast",aspectRatio:"9:16",duration:"5"}, {type:"video",stepLabel:"Variation B — Kling",model:"kling3-pro",aspectRatio:"9:16",duration:"5"}, {type:"video",stepLabel:"Variation C — Seedance",model:"seedance-2.0-i2v",aspectRatio:"9:16",duration:"5"} ] edges=[ {fromIndex:0,toIndex:2,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:4,fromHandle:"out",toHandle:"image"}, {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"prompt"}, {fromIndex:1,toIndex:3,fromHandle:"out",toHandle:"prompt"}, {fromIndex:1,toIndex:4,fromHandle:"out",toHandle:"prompt"} ] Example 4 — Multi-image reference video (character consistency): nodes=[ {type:"upload",stepLabel:"Ref 1 — Character front"}, {type:"upload",stepLabel:"Ref 2 — Character side"}, {type:"upload",stepLabel:"Ref 3 — Outfit detail"}, {type:"text",stepLabel:"Scene prompt",prompt:"The character walks through a neon market at night"}, {type:"video",stepLabel:"Generate with refs",model:"veo3-1-ref",aspectRatio:"16:9"} ] edges=[ {fromIndex:0,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:1,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:2,toIndex:4,fromHandle:"out",toHandle:"ref-images"}, {fromIndex:3,toIndex:4,fromHandle:"out",toHandle:"prompt"} ] Example 5 — Music video: parallel music + visuals tracks converging: nodes=[ {type:"music",stepLabel:"Track 1 — Score",prompt:"Dreamy lo-fi beat, 90 BPM",duration:"60"}, {type:"text",stepLabel:"Track 2 — Scene",prompt:"A lone astronaut drifting past a glowing planet"}, {type:"image",stepLabel:"Keyframe",model:"fal-ai/nano-banana-pro",aspectRatio:"16:9"}, {type:"video",stepLabel:"Animate",model:"ltx2-audio",aspectRatio:"16:9"} ] edges=[ {fromIndex:1,toIndex:2,fromHandle:"out",toHandle:"in"}, {fromIndex:2,toIndex:3,fromHandle:"out",toHandle:"image"}, {fromIndex:0,toIndex:3,fromHandle:"out",toHandle:"audio"} ] Return only the structured object — no prose, no markdown.
Connector
edit_image
Avocado AI
Modify an existing image. REQUIRED input: exactly one of file_id OR image_url. base64 is NOT accepted — do not try to pass image bytes as a tool argument, the call will be rejected. For chat-attached images you MUST first call prepare_image_upload to get a signed PUT URL, upload the bytes there (via the inline widget on Claude.ai, or via curl on Claude Desktop / Claude Code), then call this tool with the returned file_id. For URLs the user has pasted, use image_url directly. Returns a jobId immediately; call check_job with the jobId to retrieve the edited image inline. Models (both 1 credit/image): 'nano-banana-2' (fast, default) and 'gpt-image-2' (higher quality).
Connector
createImageAsset
NotFair-GoogleAds
Upload a PNG/JPEG image asset from an HTTPS URL. Pick the field type by SERVING SLOT, not by aspect ratio: MARKETING_IMAGE (Display/PMax 1.91:1, min 600x314) | SQUARE_MARKETING_IMAGE (Display/PMax 1:1, min 300x300) | AD_IMAGE (Search/Display 'image extension' on RSAs — accepts either 1.91:1 OR 1:1 source, campaign/ad_group link levels only). Optionally link it to serving targets via `targets`. Returns changeId, assetId, and link resource names. To attach an existing image to more targets later, call `linkAsset`.
Connector
generate_video_to_storyboard
Avocado AI
Generate an AI video and place it directly on a user's Avocado AI storyboard. Drops a 'Generating...' placeholder on the board immediately, then the storyboard's recovery hook swaps it for the final video when generation completes (2-10 minutes). Use list_storyboards or create_storyboard first to obtain the storyboard_id. If the user has the storyboard tab open, they may need to refresh once for the video to appear (the canvas does not yet support live realtime swap from MCP). Eight models supported: seedance-2.0-t2v / -t2v-fast (text only), seedance-2.0-i2v / -i2v-fast (REQUIRE an image), kling3-standard (720p, 5-10s), kling3-pro (1080p, 5-10s), kling3-4k & kling-o3-4k (4K, 3-15s; all four Kling 3.x variants support BOTH text-to-video and image-to-video). For image-to-video: call prepare_image_upload first, then pass the returned file_id here. Pricing is per-second, varies by model and resolution.
Connector
set_board_mode
cnvs.app
Choose whether this board is a freeform whiteboard ('draw', the default) or a kanban task board ('todo'). Mode is switchable WHENEVER the board is empty of real content: drawings (text/strokes/images) and tasks. Empty or seeded columns DON'T count (switching to 'draw' clears them), so a cleared board can be switched again, and you can flip draw<->todo freely until the first stroke/text/image or task lands. Setting 'todo' auto-seeds three starter columns (To do / In progress / Done). Returns `{ mode, columns }`. Use the task/column tools (`create_task`, `create_column`, …) once the board is in 'todo' mode.
Connector
wsdot_search_cameras
wsdot-mcp-server
Returns WSDOT highway camera locations, descriptions, and image URLs. Camera images are copyright WSDOT — only metadata and image URLs are returned, not image bytes. Filter by state route (e.g. "090" for I-90), WSDOT region, or milepost range. Omit all filters to list all cameras statewide (potentially hundreds).
Connector

Matching MCP Servers

Zhipu Text-to-Image MCP Server
Image & Video Processing Multimedia Processing
2716025420
A
license
-
quality
D
maintenance
Enables text-to-image generation using Zhipu AI's CogView-4 API. Supports generating images from text prompts with configurable size and quality parameters through MCP-compatible clients like Claude Desktop and Cline.
Last updated 2025-12-07
7
MIT
Video to Text MCP Server
Multimedia Processing Audio Processing Speech Processing
strzhao
A
license
B
quality
D
maintenance
Enables downloading videos from platforms like YouTube and converting them to text using OpenAI Whisper and ffmpeg. It supports multiple output formats including TXT, JSON, SRT, and VTT for transcriptions.
Last updated 2026-01-13
2
3
ISC

Matching MCP Connectors

Content to Social
Transform any blog post or article URL into ready-to-post social media content for Twitter/X threads, LinkedIn posts, Instagram captions, Facebook posts, and email newsletters. Pay-per-event: $0.07 for all 5 platforms, $0.03 for single platform.
sms-mcp
The Mobile Text Alerts SMS MCP server enables your AI to send SMS messages & manage contacts

generate_video
switch
Generate Switch video across the real provider lineup (Kling, Seedance, Switch Video/WAN 2.7, Switch Video Edit, Topaz upscale) and modes (text-to-video, image-to-video, frame-to-frame, motion, omni, reference-to-video, video-edit, upscale). ALWAYS call list_video_models first to pick the right model + mode and see its required inputs. Pass one shot, or shots:[...] for a storyboard (max 4 by default, hard max 10) where EACH shot is DIFFERENT — never repeat one prompt to get copies. Renders async (~30-90s); a background job delivers each clip to the library. Returns a task_id per shot — poll get_video_status or list_my_videos.
Connector
prepare_image_upload
Avocado AI
MANDATORY first step whenever the user attached an image in chat (or pointed at a local file on disk) and wants edit_image or image-to-video generation. Returns a signed PUT URL plus a file_id. After this tool: either (a) the inline upload widget will let the user drop the file and auto-continue (Claude.ai web), or (b) you run a curl PUT yourself if you have shell access (Claude Desktop / Claude Code) — the response text contains a ready-to-run curl command. Then call edit_image or generate_video with file_id=<returned id>. edit_image and generate_video do NOT accept base64 — calling them with raw image bytes WILL fail. This tool is the only working path for chat attachments. Set `purpose` to 'edit' or 'video' so the upload widget points the user at the right downstream tool.
Connector
UploadImageToWixSite
mcp
Upload one or more images to a Wix site's Media Manager. Returns wixstatic.com URL and media ID. Do NOT use ExecuteWixAPI or code execution for image uploads — use this tool directly. Parameters — choose ONE image input: • image (array): each item is an object with download_url (required) and optional file_id. Pass ALL images in one call. • imageBase64 (string): base64-encoded image + mimeType. One image at a time.
Connector
list_categories
MemeStack
List all available image categories. Use this to discover what categories exist before calling browse_by_category.
Connector
bulk_schedule
SendIt
Schedule multiple posts at once from CSV content. USE THIS WHEN: • User has a spreadsheet or list of posts to schedule • Planning a content calendar for a month • Migrating content from another tool CSV FORMAT (required columns): • platform: linkedin, instagram, x, tiktok, threads • scheduled_time: ISO 8601 format (e.g., 2024-02-15T10:00:00Z) • text: Post content/caption OPTIONAL COLUMNS: • media_url: Image or video URL • first_comment: First comment to add (Instagram/LinkedIn) • hashtags: Additional hashtags to append PROCESS: 1. First call with validate_only: true to check for errors 2. Review validation report with user 3. Call again with validate_only: false to execute import
Connector
restore_face
Image Tools - Background Removal, Upscaling & Face Restoration
Restore and enhance faces in an image using GFPGAN. Detects all faces via RetinaFace, restores quality (fixes blur, noise, compression artifacts), and pastes them back. Optionally enhances the background using Real-ESRGAN. GPU-accelerated, sub-3s latency. Args: image_base64: Base64-encoded image data containing faces (PNG, JPEG, WebP). upscale: Output upscale factor -- 1 to 4 (default: 2). enhance_background: Whether to enhance background with Real-ESRGAN (default: true). Returns: dict with keys: - image (str): Base64-encoded restored image - format (str): Output image format - width (int): Output width - height (int): Output height - upscale (int): Scale factor applied - processing_time_ms (float): Processing time in milliseconds
Connector
erase
cnvs.app
Delete a single item by id. `kind` MUST match the item type: 'text' for text nodes, 'line' for freehand strokes, 'image' for images — the wrong kind silently targets the wrong table and is a common mistake. Get the id + type from `get_board` (texts[], lines[], images[]). There is no bulk/erase-all tool: loop if you need to delete multiple items.
Connector
nausika_get_place
Nausika
Fetch full detail for a single place given its 'id'. Accepts either a full UUID or the 8-char [xxxxxxxx] short-id shown by nausika_search_places. Returns canonical attributes (name/coords/category/type), localized i18n names+descriptions, wiki image URLs, ratings aggregates, plus extras only this tool provides: the raw OpenStreetMap tags of the primary OSM feature, and direct links to OSM, Wikidata, and Wikipedia. Use this after nausika_search_places returns a result you want to drill into. For proximity / text search, use nausika_search_places.
Connector
analyze_image
sheetsdata-mcp
Analyze an image from a component's datasheet using vision AI. Use this when read_datasheet returns a section containing images and you need to extract data from a graph, package drawing, pin diagram, or circuit schematic. Pass the image_key from the read_datasheet response (the storage path in the image URL). Optionally pass a specific question to focus the analysis. IMPORTANT: For precise numeric values (electrical specs, max ratings), prefer read_datasheet text tables first — they are more reliable than vision-extracted graph data. Use analyze_image for visual information not available in text: package dimensions from drawings, pin assignments from diagrams, graph trends, and approximate values from characteristic curves. Examples: - analyze_image(part_number='IRFZ44N', image_key='images/abc123.png') -> classifies and describes the image - analyze_image(part_number='IRFZ44N', image_key='images/abc123.png', question='What is the drain current at Vgs=5V?')
Connector
colorize_image
Sats4AI - Bitcoin-Powered AI Tools
Colorize black-and-white or grayscale photos. DDColor (dual-decoder, ICCV 2023) — vivid, natural colorization. Impossible for text/vision LLMs. 5 sats per image, pay per request with Bitcoin Lightning — no API key or signup needed. Requires create_payment with toolName='colorize_image'.
Connector
generate_image
Frenchie
Generate a single image from a text prompt through Frenchie. Required: prompt. Optional: style (free-text style direction), size, quality, format, background. stdio mode auto-saves the image to .frenchie/<slug>/generated.<ext>; HTTP mode returns a presigned imageUrl that the agent should download for the user.
Connector
get_element
Webcake Landing
Returns detailed usage for one element type — or for many in a single call (BATCH MODE): summary, when to use it, key `specials` fields, a SPARSE skeleton node (the exact shape to emit — the server hydrates omitted boilerplate), and (for common types) a filled example. Pass `types: [...]` to fetch a whole section's worth of element types at once (e.g. ['section','text-block','image-block','button']) — returns { elements: { [type]: details } } and saves a round-trip per type. `type` (single) returns the doc directly for backward compatibility.
Connector
render_mermaid
Diagrams MCP
Render a Mermaid diagram definition and return the image with metadata. The definition should be valid Mermaid syntax (e.g. flowchart, sequence, class, ER, state, or Gantt diagram). Returns a list of content blocks: the rendered image plus a JSON text block with metadata including a mermaid.live edit link for opening the diagram in a browser editor. Args: definition: Mermaid diagram definition text. filename: Output filename without extension. format: Output format — ``"png"`` (default), ``"svg"``, or ``"pdf"``. download_link: If True, return a temporary download URL path (/images/{token}) that expires after 15 minutes; if False, return inline image bytes. Defaults to True (URL) — set ``DIAGRAMS_INLINE_DEFAULT=true`` on the server to flip the default. SVG/PDF and PNGs larger than the inline limit always use a download link.
Connector
render
ShotAPI MCP Server
Render HTML/CSS code as an image. Turn any markup into a visual preview. Useful for: previewing UI code, checking CSS layouts, turning design mockups into shareable images. Supports <style> tags, inline CSS, and common HTML features. Output is auto-cropped to content — no wasted blank space below. Args: html: The HTML/CSS code to render width: Viewport width in pixels (default: 1280) height: Viewport height in pixels — output auto-cropped to content (default: 720) format: Image format — "jpeg" saves tokens, "png" for crisp text, "webp" smallest (default: "jpeg")
Connector