Skip to main content
Glama

take_screenshot

Capture screen content as an image to enable AI assistants to see and analyze what users are viewing through a simple interface.

Instructions

Take a screenshot of the user's screen and return it as an image

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • Core handler function that implements the screenshot capture logic using pyautogui, processes the image into a JPEG buffer, and returns it as an MCP Image object.
    def take_screenshot() -> Image:
        """
        Take a screenshot of the user's screen and return it as an image. Use
        this tool anytime the user wants to look at something they're doing.
        """
        import pyautogui
    
        buffer = io.BytesIO()
        # if the file exceeds ~1MB, it will be rejected by Claude
        screenshot = pyautogui.screenshot()
        screenshot.convert("RGB").save(buffer, format="JPEG", quality=60, optimize=True)
    
        return Image(data=buffer.getvalue(), format="jpeg")
  • Registers the 'take_screenshot' MCP tool with the server, providing the name, description, and a thin wrapper function that delegates to the core implementation.
    @mcp_server.tool(
        name="take_screenshot",
        description="Take a screenshot of the user's screen and return it as an image",
    )
    def screenshot_tool() -> Image:
        """Wrapper around the screenshot tool implementation"""
        return take_screenshot()
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It states the action and output format but lacks details on behavioral traits such as permissions required, whether it captures the entire screen or a region, privacy implications, or error handling. This is a significant gap for a tool with potential security and usability concerns.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action and outcome with zero wasted words. It is appropriately sized for the tool's simplicity and directly communicates its purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (a potentially sensitive screen capture tool) and lack of annotations or output schema, the description is incomplete. It fails to address critical context like security permissions, scope of capture, or return format details, leaving gaps that could hinder safe and effective use by an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0 parameters with 100% coverage, so no parameter documentation is needed. The description does not add parameter semantics, but this is acceptable as there are no parameters to explain, aligning with the baseline for zero parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('take a screenshot') and the resource ('user's screen'), with the outcome specified ('return it as an image'). It distinguishes this tool's purpose unambiguously, especially given no sibling tools exist for comparison.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context (e.g., when a visual capture of the screen is needed) but does not provide explicit guidance on when to use it versus alternatives or any exclusions. With no sibling tools, this is adequate but lacks depth on prerequisites or constraints.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/codingthefuturewithai/screenshot_mcp_server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server