Skip to main content
Glama
261,889 tools. Last updated 2026-07-05 14:00

"Exploring text-to-image generation techniques" matching MCP tools:

  • Generate an AI image using Avocado AI. Returns a jobId immediately; image generation completes in 10-60 seconds. After calling, use the check_job tool with the returned jobId to retrieve the result, once complete, check_job returns the image inline so it renders directly in chat. Run models_list to see available models. Costs 1-4 credits per image depending on model and quality.
    Connector
  • Generate an AI image and place it directly on a user's Avocado AI flow (the Flows Director). Drops a 'Generating...' tile on the flow immediately, then swaps it for the final image when generation completes (10-60s). It appears live on the open canvas and in the Director Library, grouped by role. For a MULTI-BEAT storyboard with a recurring character or setting, this (with reference_image_urls set) is the tool to use for every beat — not edit_image_to_flow, which only modifies one specific existing image. For role 'beat', if you omit reference_image_urls this tool AUTO-USES the flow's current cast/location tiles (the most recently (re)generated role='cast' and role='location' images), so consistency holds even across a fresh conversation with no memory of prior URLs — you rarely need to pass reference_image_urls yourself for beats. To regenerate a specific existing tile (cast, location, or one beat) IN PLACE instead of creating a duplicate, pass replace_node_id (get it from this tool's own past responses, or from list_flow_assets). Costs match generate_image (1-4 credits per image depending on model and quality).
    Connector
  • Execute a single call that `consult` handed you, and bill on success. Used for any external capability (image/video/audio generation, web search, scraping, email, document parsing, code sandbox, browser automation, embeddings, etc.). The server validates params against a registered schema and proxies to the upstream — you never pass URLs or API keys. Always get the exact (service, action, params, max_cost_cents) from `consult` first — don't guess them.
    Connector
  • Convert text to speech by cloning the voice from an audio sample you provide (voice-cloning text-to-speech). Both text and sample are required; the text is limited to 1000 characters and the sample is supplied as a URL or base64 audio that must be at most 15MB, with violations returning HTTP 400. Synchronous: the call blocks until generation finishes and returns a single audio result containing a URL; there is no separate polling step. Credits are charged on success. Use this when you have a reference voice sample to clone; use createSpeechPreset to speak with a built-in named preset voice instead, and createVoice to design a brand-new voice from a text description rather than cloning one. Pass an optional request_id to tag the result so you can locate it later via getAudioResults. Requires an API key (user scope). Credits: This endpoint consumes 1 credits per call.
    Connector
  • Look up a MITRE ATLAS technique — the AI/ML adversarial attack catalog. ATLAS catalogues TTPs targeting machine learning systems: prompt injection, model evasion, training data poisoning, model theft, etc. Roughly 80% of ATLAS techniques are AI/ML-specific (no ATT&CK bridge); 20% mirror an enterprise ATT&CK technique via attack_reference_id — use that to pivot to D3FEND defenses (d3fend_defense_for_attack) and CVE search. Sub-techniques inherit `tactics` from the parent (inherited_tactics=true flag) when ATLAS upstream leaves them empty. Use this tool when the user asks about AI/ML threats, LLM red-teaming, or adversarial ML; for multiple techniques in one call (e.g. drilling into a case study's techniques_used), prefer bulk_atlas_technique_lookup. Returns 404 when the id is not in the synced ATLAS catalog. Free: 30/hr, Pro: 500/hr. Returns {technique_id, name, description, tactics, inherited_tactics, maturity (demonstrated|feasible|realized), attack_reference_id, attack_reference_url, subtechnique_of, created_date, modified_date, next_calls}.
    Connector
  • Bulk ATLAS technique lookup — retrieve full records for up to 50 techniques in a single request instead of N separate atlas_technique_lookup calls. Designed as the natural follow-up to atlas_case_study_lookup, whose techniques_used array can be passed directly. Each item is the same shape as atlas_technique_lookup, including parent-tactics inheritance for sub-techniques (inherited_tactics=true flag) and per-item next_calls (D3FEND bridge when attack_reference_id present, sibling-technique search by tactic, parent lookup for sub-techniques). Free: 30/hr (1 per item), Pro: 500/hr. Returns {results [{technique_id, status (ok|not_found|invalid_format), technique, error}], total, successful, failed, partial, summary}.
    Connector

Matching MCP Servers

Matching MCP Connectors

  • Render HTML and CSS to PNG images over HTTP. Send HTML and CSS and get a PNG back.

  • An MCP server for generating images from HTML & CSS or screenshots of URLs using htmlcsstoimage.com.

  • Current & trending AI MODELS from the open-model ecosystem (Hugging Face) — name, org, task, popularity (likes/downloads) and release date. Use for "what AI models are trending / newest / what's the latest <X> model". This is the OPEN side (Llama, Qwen, DeepSeek, Mistral, Gemma, Phi…); for the closed flagships (GPT, Claude, Gemini, Grok) with pricing & versions use search_ai_models. Args: query: search a model name (e.g. llama, qwen, whisper). org: filter by org/author (e.g. meta-llama, deepseek-ai, Qwen, mistralai, google). task: text-generation (default), text-to-image, automatic-speech-recognition, … or 'any'. sort: trending (default) | newest | downloads. limit: max results. Every value is returned in an Ed25519-signed, provenance-stamped envelope (source and observation time) you can verify offline against /.well-known/keys, no account required.
    Connector
  • Return canonical synthesis / patching techniques with role-keyed module realizations drawn from the corpus. Use this when the user asks "how do I do X?" with X being a recognisable technique (low-pass-gate plucks, pinged-filter percussion, parallel multiband processing, complex-oscillator FM, karplus-strong pluck, clocked-delay feedback, modal-resonator excitation, wavefolder harmonics, envelope-follower ducking, Maths-style function-generator omnibus). It's also the right tool when the user has a module and asks "what's this good for?" — pass filter.module_id to retrieve every technique that references the module via its role_realizations. Each technique declares role_definitions (the roles the technique uses, each with required and optional affordances) and role_realizations (concrete modules that fill each role, with the affordances they provide). The model substitutes modules from the user's rack into roles by affordance match — DO NOT treat the realization list as exhaustive or as a recipe. Args: - filter (optional): { capability?, module_id?, text? } - capability: kebab-case capability id (see search_modules _meta.taxonomy). Returns techniques whose required *or* optional capability list includes this id. - module_id: "<manufacturer>/<module-slug>". Returns techniques that have a role_realization referencing this module. - text: free-text phrase. Substring-matches against technique id/label/description AND a curated alias table (technique_aliases) — that's the right surface when a user types evocative prose like "stuttering delay", "plucked string", "source of uncertainty" that doesn't grep against any kebab-case id. Two-way alias match: long alias ("source of uncertainty") matches short query ("uncertainty"), and vice versa. - When multiple filters supplied, AND-intersects. - Omit filter entirely to list all techniques. Returns: { "techniques": [ { "id": "low-pass-gate-pluck", "label": "Low-Pass Gate Pluck", "description": "Send a short envelope...", "required_capabilities": ["lowpass-gate"], "optional_capabilities": ["envelope-generator", "function-generator"], "role_definitions": [ { "role_id": "lpg", "description": "The vactrol-based or vactrol-emulating element. Strictly required...", "required_affordances": ["lowpass-gate"], "optional_affordances": [] }, ... ], "role_realizations": [ { "role_id": "lpg", "module_id": "make-noise/optomix", "affordances_provided": ["lowpass-gate"], "notes": "Two-channel vactrol-based LPG..." }, ... ], "canonical_instance": { "rationale": "...", "lineage": [ { "position": 1, "label": "Buchla 292 (1970)", "module_id": null, "notes": "..." }, { "position": 2, "label": "Tiptop Audio Buchla 292t", "module_id": "tiptop-audio/buchla-292t" }, ... ] }, "counter_canonical_notes": [ { "claim_pushed_back_against": "Optomix is the canonical pairing with Plaits...", "evidence": "The corpus catalogs 19 LPG-capable modules..." } ], "coverage": [ { "role_id": "voice", "realizations_count": 3 }, { "role_id": "lpg", "realizations_count": 19 }, { "role_id": "env", "realizations_count": 6 }, { "role_id": "clock", "realizations_count": 2 } ] } ], "_meta": { "filter": {...}, "feedback_hint"?: string } } How to use role data: - role_realizations are CURATORIAL SAMPLES, not exhaustive lists. The coverage[].realizations_count tells you how many are documented; other modules may fill the same role. - To find modules in the user's rack that can fill a role, use find_role_realizations(technique_id, role_id, available_modules). - canonical_instance is opt-in and sparse. Most techniques don't have one; that absence is information. When present, it documents a documented historical lineage (e.g., Buchla 292 → 292t → MMG → Optomix for low-pass-gate-pluck) — NOT a prescription. - counter_canonical_notes push back on likely training-data priors. When the user invokes a canonical-sounding claim that has a counter_canonical_note, surface the pushback. Errors: - "Module not found: <id>" if filter.module_id is supplied and unknown. - Empty techniques[] with a feedback_hint when filters produce no matches — call report_gap if the user expected coverage.
    Connector
  • Generate one or more images from a text prompt, billed to the caller's credits. Requires authentication. Anonymous image generation is available only via the REST API (``POST /v1/image-generators/{id}/runs``); the MCP transport always authenticates. Resolution order for the generator (highest priority first): 1. A deployed ``generator`` ref (``uuid@version`` or bare UUID): pins the deployed version config. 2. The ``model`` control path (authenticated one-off, ephemeral). Not usable from published templates. 3. A tier ``generator`` ref (``system:<tier>``): resolves to the tier's current best model (auto-upgrade). Available tiers: ``system:image-standard`` (default), ``system:image-premium``, ``system:image-edit`` (image-to-image, requires ``reference_image_url``). 4. Default: ``system:image-standard`` when no generator or model is given. ``generator`` and ``model`` are mutually exclusive. For ``image_to_image`` generators, ``reference_image_url`` is required and must be a public HTTP or HTTPS URL. For ``text_to_image`` generators, providing ``reference_image_url`` is rejected. Billing: spend is deducted from the caller's monthly credit balance. ``BudgetExhausted`` (402) and ``AccountSuspended`` (403) propagate if the balance is zero or the account is suspended. ``visibility`` sets the access level of the hosted copy of each image: ``public`` (default) returns a link that opens in any browser; ``private`` returns a link only you can open and forward to people you choose, while the plain URL stays locked. Returns: ``{run_id, model_tier_or_model, image_url, image_urls, width, height, num_images, cost_usd, duration_ms, status, created_at, error_code, error_message, hosted_images}``. ``hosted_images`` carries the durable Goodeye-hosted copy of each image with its ``url`` (the browser-viewable link) and ``visibility``. The prompt is never stored; only its hash is persisted on the run row.
    Connector
  • Generates one or more images from a text prompt (T2I) or a text prompt + reference image(s) (I2I). Submits the job, polls until terminal, and returns the final image URLs. Default model is 'grok-imagine-t2i' (fast, 6 images per generation, 5 credits). Use list_image_models to see the full lineup with pricing. For I2I, pass `referenceImages` as an array of public image URLs and pick a model with I2I support (e.g. 'grok-imagine-i2i', 'wan-2.5-spicy-i2i'). ## Model selection guide (when the user does not specify a model) Default: `grok-imagine-t2i` (5 cr, 6 outputs per call, fast, general purpose). **Strong recommendation: when a single high-quality output is what's wanted** (most agent / one-shot workflows), prefer `gpt-image-2-t2i` (9 cr @ 1K / higher @ 2K, single deterministic image, best general quality across realism, illustration, typography, and composition; supports up to 2K resolution and most aspect ratios including auto). This is the front-runner for serious creative output where you don't need to pick from 6 variations. Pick a different model when the prompt has these signals: - "single best result" / "one image" / production / no time to pick from variations -> `gpt-image-2-t2i` (9 cr, 1 output, top general quality) - "photoreal" / "photo of" / "realistic" -> `gpt-image-2-t2i` (9 cr, best general realism) or `imagen-4` (12 cr, very high quality) or `z-image-turbo` (3 cr, fastest) - "highest quality" / "premium" / no budget -> `gpt-image-2-t2i` at 2K, or `grok-imagine-quality-t2i` (16 cr @ 1K, 22 cr @ 2K), or `imagen-4-ultra` - Text inside the image (signs, posters, typography) -> `ideogram-v3-t2i` (best in class) or `gpt-image-2-t2i` (also strong) - Artistic / painterly / stylized -> `midjourney-t2i` - Album art / cover art -> `gpt-image-2-t2i` for one strong image; `grok-imagine-t2i` for 6 variations to choose from; `seedream-v4-t2i` if 4K wanted - Logo or design with embedded text -> `ideogram-v3-t2i` - NSFW / adult / explicit -> `wan-2.5-spicy-t2i` (auto-tags creation as 18+; routes to adult gallery) - Cheapest possible / quick test -> `z-image-turbo` (3 cr) - Multiple variations to compare -> keep `grok-imagine-t2i` (6 outputs default) or use `numImages` on a multi-output model For I2I (reference image provided): prefer the dedicated `aetherwave_edit_image` tool for "change something in this image" intent. Use `aetherwave_generate_image` with I2I models only when you specifically want style transfer (`midjourney-i2i`), premium quality (`grok-imagine-quality-i2i`), or adult content (`wan-2.5-spicy-i2i`). Always pass an explicit `aspectRatio` (e.g. "1:1" for square album art, "16:9" for video thumbnails, "9:16" for shorts/reels). Some upstream providers reject submissions with no aspect ratio. Ask the user only when: - The prompt contradicts itself (e.g., "highest quality but cheapest") - The user requested "the best model" with no context, surface 2-3 options with tradeoffs - A single generation would cost more than 20 credits and the user has not confirmed
    Connector
  • Edits an existing image guided by a text prompt. Pass a public `imageUrl` plus a `prompt` describing the change ("add a moon to the sky", "swap the background for a neon city", "make it look like a comic panel"). Submits, polls, and returns the edited image URL(s). Default model is 'grok-imagine-i2i' (6 cr per call, returns 2 variations, ~30s, best cost-to-quality on standard edits). Other I2I-capable models: 'seedream-v4-edit', 'wan-2.5-spicy-i2i', 'flux-kontext-pro', 'qwen-image-edit', 'gpt-image-1.5-i2i' (slow, ~5min). Use list_image_models for full lineup. Note: source URLs with spaces or parentheses may fail upstream; prefer clean URLs. ## Model selection guide for edits Default: `grok-imagine-i2i` (6 cr per call, returns 2 variations = 3 cr/image effective, fast ~30s, strong general-purpose edit quality). Pick a different model when: - Need a single deterministic output, or 4K resolution -> `seedream-v4-edit` (7 cr per image, supports 1K/2K/4K, multi-image up to 6) - Subtle edits / preserve composition / character consistency -> `flux-kontext-pro` or `flux-kontext-max` - NSFW edits -> `wan-2.5-spicy-i2i` - Highest quality, time is not a concern (~5 min OK) -> `gpt-image-1.5-i2i` or `grok-imagine-quality-i2i` (16 cr @ 1K, 22 cr @ 2K) - Stylized / artistic transformation -> `midjourney-i2i` If the user simply says "edit this image" with no other signal, default to `grok-imagine-i2i`.
    Connector
  • Generate a NEW image from a text prompt via the platform's allowlisted image-gen provider (currently OpenAI gpt-image-1) and return an asset_id ready to attach to create_post / add_comment / send_dm. Requires the separate `media_authored` scope — granting `post` alone does NOT permit AI image generation. The user must have ticked the box on caulo.ai/settings/agents. Pipeline: caulo.ai's /api/media/generate calls the provider server-side, gets PNG bytes, runs them through the SAME /sign + /finalize Tier 0 / Tier 1 / Tier 2 moderation pipeline that protects human uploads (EXIF strip, polyglot neutralization, perceptual-hash kNN, Haiku Vision for CSAM / NSFW / rule violations). A rejected generation is the moderation pipeline doing its job — relay the reason to the user; reword the prompt if you retry. Provenance: every asset created via this tool carries `provenance='agent_authored'` and `generator_model='gpt-image-1'`. Attaching it to a post or comment forces that row's badge to `agent_authored` too — enforced by a database trigger (db/62), not by convention, regardless of any authorship declaration on create_post. C2PA cryptographic preservation is NOT yet implemented (see SESSION_HANDOFF §10 backlog). Returns { asset_id, status: 'approved' | 'rejected', nsfw_level?, generator_model }.
    Connector
  • Convert text to speech using a named built-in preset voice, with optional emotion and language settings. Both text and voice_preset_id are required and the text is limited to 1000 characters; invalid input returns HTTP 400. Synchronous: the call blocks until generation finishes and returns a single audio result containing a URL; there is no separate polling step. Credits are charged on success. Use this when you want a ready-made catalog voice and do not need to supply your own sample; use createSpeech to clone a voice from an audio sample instead, and createVoice to design a new voice from a text description. Pass an optional request_id to tag the result so you can locate it later via getAudioResults. Requires an API key (user scope). Credits: This endpoint consumes 1 credits per call.
    Connector
  • Render a Mermaid diagram definition and return the image with metadata. The definition should be valid Mermaid syntax (e.g. flowchart, sequence, class, ER, state, or Gantt diagram). Returns a list of content blocks: the rendered image plus a JSON text block with metadata including a mermaid.live edit link for opening the diagram in a browser editor. Args: definition: Mermaid diagram definition text. filename: Output filename without extension. format: Output format — ``"png"`` (default), ``"svg"``, or ``"pdf"``. download_link: If True, return a temporary download URL path (/images/{token}) that expires after 15 minutes; if False, return inline image bytes. Defaults to True (URL) — set ``DIAGRAMS_INLINE_DEFAULT=true`` on the server to flip the default. SVG/PDF and PNGs larger than the inline limit always use a download link.
    Connector
  • MANDATORY first step whenever the user attached an image in chat (or pointed at a local file on disk) and wants edit_image or image-to-video generation. Returns a signed PUT URL plus a file_id. How the bytes get uploaded depends on WHERE you run, and the discriminator is network access to the URL, not shell access: (a) Claude.ai (web, desktop, or mobile app): this tool renders an inline upload widget. The user drops the image into it; it uploads from their browser and pushes the file_id back automatically. Your code-execution sandbox has NO network route to the signed URL — a chat attachment sitting on the sandbox filesystem does NOT mean you can upload it. NEVER attempt curl/fetch/Python uploads from a sandbox and never investigate domain allowlists; just ask the user (one short sentence) to drop the image into the widget, then stop and wait. (b) Claude Code / a CLI with a real shell on the user's machine: run the ready-made curl PUT from the response text. Then call edit_image or generate_video with file_id=<returned id>. edit_image and generate_video do NOT accept base64 — calling them with raw image bytes WILL fail. This tool is the only working path for chat attachments. Set `purpose` to 'edit' or 'video' so the upload widget points the user at the right downstream tool.
    Connector
  • Verify that an AI-generated image actually used the colours specified in an agent_brief call. Supply the generated image (URL or base64) and the target palette from agent_brief colour_tokens. Returns a fidelity score 0-100, dE2000 distance per colour, match quality per colour (accurate/acceptable/drifted/ignored), and an overall verdict. Use after agent_brief + image generation to close the colour loop.
    Connector
  • Deploy a reusable image generator that workflows reference to produce images from a chosen model: creates it or appends a version. An image generator is a named, versioned configuration that routes image generation calls to a specific model. Generators are private and owner-scoped. Workflows reference them by UUID or ``uuid@version``. You cannot deploy a new generator whose ``name`` matches an active platform ``scope=system`` generator (those are tier-level configs that are run-only and not listed or fetched). Versioning: the first deploy with a given ``name`` creates the generator at version 1. Re-deploying the same ``name`` appends a new version and requires ``expected_version_token`` from the latest known version (returned by deploy/list/get). A new generator must omit the token; an existing one without a token returns Conflict. Deploy-time validation: the ``model`` is checked against the pricing layer. A model that does not resolve to a known image endpoint with an authoritative price is rejected before any row is written. Returns: ``{generator_id, name, description, current_version, version, version_token, status, scope, provider, model, generation_contract, config_hash, created_at}``. Persist ``version_token`` for the next re-deploy.
    Connector
  • Upload an image and return the hosted image record. ``data`` must be the image bytes encoded as standard base64 (RFC 4648). Accepted image formats are PNG, JPEG, WebP, and GIF. ``visibility`` controls who can access the served URL: ``"public"`` makes it accessible to anyone with the link; ``"private"`` (default) requires the owner's credentials. Accepted values: ``"public"``, ``"private"``. ``ttl_seconds`` sets an expiry relative to now (positive integer). Omit to create a permanent image. Returns: ``{id, token, url, visibility, expires_at, size_bytes, content_type}``.
    Connector
  • Analyze an image from a component's datasheet using vision AI. Use this when read_datasheet returns a section containing images and you need to extract data from a graph, package drawing, pin diagram, or circuit schematic. Pass the image_key from the read_datasheet response (the storage path in the image URL). Optionally pass a specific question to focus the analysis. IMPORTANT: For precise numeric values (electrical specs, max ratings), prefer read_datasheet text tables first — they are more reliable than vision-extracted graph data. Use analyze_image for visual information not available in text: package dimensions from drawings, pin assignments from diagrams, graph trends, and approximate values from characteristic curves. Examples: - analyze_image(part_number='IRFZ44N', image_key='images/abc123.png') -> classifies and describes the image - analyze_image(part_number='IRFZ44N', image_key='images/abc123.png', question='What is the drain current at Vgs=5V?')
    Connector
  • Browse the 100 most popular meme templates. Returns template name, image URL, dimensions, and text box coordinates. Use template IDs with caption_image to create memes.
    Connector