Skip to main content
Glama
stabgan

OpenRouter MCP Multimodal Server

analyze_image

Read-only

Analyze an image from a file path, URL, or data URL. Optionally ask a question about the image to get specific insights.

Instructions

Analyze an image using a vision model

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
image_pathYesFile path, URL, or data URL
questionNoQuestion about the image
modelNo

Implementation Reference

  • Main handler function that validates image_path, fetches and optimizes the image via prepareImageUrl, sends it to a vision model via OpenAI chat completions, and returns the extracted text response.
    export async function handleAnalyzeImage(
      request: { params: { arguments: AnalyzeImageToolRequest } },
      openai: OpenAI,
      defaultModel?: string,
    ) {
      const { image_path, question, model } = request.params.arguments ?? { image_path: '' };
    
      if (!image_path) {
        return toolError(ErrorCode.INVALID_INPUT, 'image_path is required.');
      }
    
      let imageUrl: string;
      try {
        imageUrl = await prepareImageUrl(image_path);
      } catch (err) {
        const msg = err instanceof Error ? err.message : String(err);
        if (msg.includes('Blocked host')) return toolErrorFrom(ErrorCode.UPSTREAM_REFUSED, err);
        if (msg.toLowerCase().includes('too large')) {
          return toolErrorFrom(ErrorCode.RESOURCE_TOO_LARGE, err);
        }
        return toolErrorFrom(ErrorCode.INVALID_INPUT, err);
      }
    
      let completion: ChatCompletion;
      try {
        completion = await openai.chat.completions.create({
          model: model || defaultModel || DEFAULT_MODEL,
          messages: [
            {
              role: 'user',
              content: [
                { type: 'text', text: question || "What's in this image?" },
                { type: 'image_url', image_url: { url: imageUrl } },
              ],
            },
          ] as ChatCompletionMessageParam[],
        });
      } catch (err) {
        return classifyUpstreamError(err);
      }
    
      const extracted = extractCompletionText(completion);
      const cutoff = detectReasoningCutoff(extracted);
      if (cutoff) return cutoff;
    
      if (!extracted.text) {
        return toolError(ErrorCode.INTERNAL, 'Vision model returned no textual content.', {
          finish_reason: extracted.finishReason,
        });
      }
      return {
        content: [{ type: 'text' as const, text: extracted.text }],
        _meta: {
          finish_reason: extracted.finishReason,
          ...(toUsageMeta(extracted.usage) ?? {}),
        },
      };
    }
  • Input schema/interface for the analyze_image tool: requires image_path, optional question and model.
    export interface AnalyzeImageToolRequest {
      image_path: string;
      question?: string;
      model?: string;
    }
  • Registration of the 'analyze_image' tool in the ListToolsResponseSchema, defining its name, description, and input schema.
    name: 'analyze_image',
    description: 'Analyze an image using a vision model',
    annotations: {
      readOnlyHint: true,
      destructiveHint: false,
      idempotentHint: false,
    },
    inputSchema: {
      type: 'object',
      properties: {
        image_path: { type: 'string', description: 'File path, URL, or data URL' },
        question: { type: 'string', description: 'Question about the image' },
        model: { type: 'string' },
      },
      required: ['image_path'],
    },
  • Dispatch in the CallToolRequestSchema handler: routes 'analyze_image' calls to the handleAnalyzeImage function.
    case 'analyze_image':
      return handleAnalyzeImage(
        wrapToolArgs(args as AnalyzeImageToolRequest | undefined),
        this.openai,
        this.defaultModel,
      );
  • Helper that fetches an image (HTTP, data URL, or local file), optionally optimizes it with sharp, and returns a data URL string for the vision model API.
    export async function prepareImageUrl(source: string): Promise<string> {
      if (source.startsWith('data:')) return source;
    
      const buffer = await fetchImage(source);
      const { base64, mime } = await optimizeImage(buffer);
      // When optimization succeeded, mime is 'image/jpeg'. When it failed, we
      // use the sniffed mime. For local files we prefer the extension-derived
      // mime (more specific) when optimization fell back.
      const finalMime =
        mime === 'image/jpeg' || source.startsWith('http') ? mime : getMimeType(source);
      return `data:${finalMime};base64,${base64}`;
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false. The description adds that it uses a vision model, but does not disclose performance characteristics, required permissions, or output format. It provides minimal behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence with no redundancy. Every word contributes to clarity, making it highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description should hint at the return type (e.g., text, structured data). It also omits limitations like image size or supported formats. The tool is simple, but the description is too sparse for complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 67% (2 of 3 parameters have descriptions). The tool description adds no additional meaning beyond the schema, so it meets the baseline expectation without improvement.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Analyze an image using a vision model' clearly specifies a verb (analyze) and resource (image), and distinguishes it from sibling tools like analyze_audio and analyze_video.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives such as generate_image or when not to use it. The description lacks any usage context or conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/stabgan/openrouter-mcp-multimodal'

If you have feedback or need assistance with the MCP directory API, please join our Discord server