Skip to main content
Glama

image_analysis

Analyze images by sending a query to vision models. Customize analysis with optional system prompts for detailed descriptions.

Instructions

Analyze an image using OpenRouter's vision capabilities.

This tool allows you to send an image to OpenRouter's vision models for analysis.
You provide a query to guide the analysis and can optionally customize the system prompt
for more control over the model's behavior.

Args:
    image: The image as a base64-encoded string, URL, or local file path
    query: Text prompt to guide the image analysis. For best results, provide context
           about why you're analyzing the image and what specific information you need.
           Including details about your purpose and required focus areas leads to more
           relevant and useful responses.
    system_prompt: Instructions for the model defining its role and behavior
    model: The vision model to use (defaults to the value set by OPENROUTER_DEFAULT_MODEL)
    max_tokens: Maximum number of tokens in the response (100-4000)
    temperature: Temperature parameter for generation (0.0-1.0)
    top_p: Optional nucleus sampling parameter (0.0-1.0)
    presence_penalty: Optional penalty for new tokens based on presence in text so far (0.0-2.0)
    frequency_penalty: Optional penalty for new tokens based on frequency in text so far (0.0-2.0)
    project_root: Optional root directory to resolve relative image paths against

Returns:
    The analysis result as text

Examples:
    Basic usage with a file path:
        image_analysis(image="path/to/image.jpg", query="Describe this image in detail")

    Basic usage with an image URL:
        image_analysis(image="https://example.com/image.jpg", query="Describe this image in detail")

    Basic usage with a relative path and project root:
        image_analysis(image="examples/image.jpg", project_root="/path/to/project", query="Describe this image in detail")

    Usage with a detailed contextual query:
        image_analysis(
            image="path/to/image.jpg",
            query="Analyze this product packaging design for a fitness supplement. Identify all nutritional claims,
                  certifications, and health icons. Assess the visual hierarchy and how the key selling points
                  are communicated. This is for a competitive analysis project."
        )

    Usage with custom system prompt:
        image_analysis(
            image="path/to/image.jpg",
            query="What objects can you see in this image?",
            system_prompt="You are an expert at identifying objects in images. Focus on listing all visible objects."
        )

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
imageYes
queryNoDescribe this image in detail
system_promptNoYou are an expert vision analyzer with exceptional attention to detail. Your purpose is to provide accurate, comprehensive descriptions of images that help AI agents understand visual content they cannot directly perceive. Focus on describing all relevant elements in the image - objects, people, text, colors, spatial relationships, actions, and context. Be precise but concise, organizing information from most to least important. Avoid making assumptions beyond what's visible and clearly indicate any uncertainty. When text appears in images, transcribe it verbatim within quotes. Respond only with factual descriptions without subjective judgments or creative embellishments. Your descriptions should enable an agent to make informed decisions based solely on your analysis.
modelNo
max_tokensNo
temperatureNo
top_pNo
presence_penaltyNo
frequency_penaltyNo
project_rootNo
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description carries full burden for behavioral disclosure. It explains that the tool uses OpenRouter's vision models and returns text, and it lists default parameter values. However, it lacks information about external API dependencies, potential latency, failure modes, or rate limits, which are important for an agent to understand. The description is adequate but not comprehensive in this regard.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear introductory sentence, parameter list, return value, and examples. While it is verbose in parts (e.g., the query parameter explanation is lengthy), every sentence adds value. It could be slightly more concise, but it is appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (10 parameters, no output schema, no annotations), the description is highly complete. It explains all parameters, specifies return type ('The analysis result as text'), and provides comprehensive examples covering various use cases. The default system prompt is also elaborated, which adds valuable context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate entirely. It does so excellently by providing detailed explanations for all 10 parameters, including their purpose, defaults, and constraints. For example, it explains that 'query' should include context for better results and provides examples. This enables correct parameter usage without relying on the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Analyze an image using OpenRouter's vision capabilities.' It specifies the action (analyze), resource (image), and technology (OpenRouter's vision), leaving no ambiguity. With no sibling tools, differentiation is not needed, but the purpose is specific and actionable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage guidance through detailed parameter descriptions and multiple examples covering file paths, URLs, contextual queries, and custom system prompts. However, it does not explicitly state when not to use this tool or mention alternatives, though none exist. 'Clear context, no exclusions' accurately reflects this.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Nazruden/mcp-openvision'

If you have feedback or need assistance with the MCP directory API, please join our Discord server