Skip to main content
Glama
omgwtfwow

MCP Server for Crawl4AI

by omgwtfwow

capture_screenshot

Capture webpage screenshots as base64 PNG data for web content analysis. Wait for page loading before taking screenshots and optionally save images locally.

Instructions

[STATELESS] Capture webpage screenshot. Returns base64-encoded PNG data. Creates new browser each time. Optionally saves screenshot to local directory. IMPORTANT: Chained calls (execute_js then capture_screenshot) will NOT work - the screenshot won't see JS changes! For JS changes + screenshot use create_session + crawl(session_id, js_code, screenshot:true) in ONE call.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to capture
screenshot_wait_forNoSeconds to wait before taking screenshot (allows page loading/animations)
save_to_directoryNoDirectory path to save screenshot (e.g., ~/Desktop, /tmp). Do NOT include filename - it will be auto-generated. Large screenshots (>800KB) won't be returned inline when saved.

Implementation Reference

  • Primary MCP tool handler: Calls backend screenshot service, optionally saves to local file, returns base64 PNG image in MCP format or file path text if too large.
    async captureScreenshot(options: ScreenshotEndpointOptions) {
      try {
        const result: ScreenshotEndpointResponse = await this.service.captureScreenshot(options);
    
        // Response has { success: true, screenshot: "base64string" }
        if (!result.success || !result.screenshot) {
          throw new Error('Screenshot capture failed - no screenshot data in response');
        }
    
        let savedFilePath: string | undefined;
    
        // Save to local directory if requested
        if (options.save_to_directory) {
          try {
            // Resolve home directory path
            let resolvedPath = options.save_to_directory;
            if (resolvedPath.startsWith('~')) {
              const homedir = os.homedir();
              resolvedPath = path.join(homedir, resolvedPath.slice(1));
            }
    
            // Check if user provided a file path instead of directory
            if (resolvedPath.endsWith('.png') || resolvedPath.endsWith('.jpg')) {
              console.warn(
                `Warning: save_to_directory should be a directory path, not a file path. Using parent directory.`,
              );
              resolvedPath = path.dirname(resolvedPath);
            }
    
            // Ensure directory exists
            await fs.mkdir(resolvedPath, { recursive: true });
    
            // Generate filename from URL and timestamp
            const url = new URL(options.url);
            const hostname = url.hostname.replace(/[^a-z0-9]/gi, '-');
            const timestamp = new Date().toISOString().replace(/[:.]/g, '-').slice(0, -5);
            const filename = `${hostname}-${timestamp}.png`;
    
            savedFilePath = path.join(resolvedPath, filename);
    
            // Convert base64 to buffer and save
            const buffer = Buffer.from(result.screenshot, 'base64');
            await fs.writeFile(savedFilePath, buffer);
          } catch (saveError) {
            // Log error but don't fail the operation
            console.error('Failed to save screenshot locally:', saveError);
          }
        }
    
        const textContent = savedFilePath
          ? `Screenshot captured for: ${options.url}\nSaved to: ${savedFilePath}`
          : `Screenshot captured for: ${options.url}`;
    
        // If saved locally and screenshot is large (>800KB), don't return the base64 data
        const screenshotSize = Buffer.from(result.screenshot, 'base64').length;
        const shouldReturnImage = !savedFilePath || screenshotSize < 800 * 1024; // 800KB threshold
    
        const content = [];
    
        if (shouldReturnImage) {
          content.push({
            type: 'image',
            data: result.screenshot,
            mimeType: 'image/png',
          });
        }
    
        content.push({
          type: 'text',
          text: shouldReturnImage
            ? textContent
            : `${textContent}\n\nNote: Screenshot data not returned due to size (${Math.round(screenshotSize / 1024)}KB). View the saved file instead.`,
        });
    
        return { content };
      } catch (error) {
        throw this.formatError(error, 'capture screenshot');
      }
    }
  • Zod input schema for capture_screenshot tool validation.
    export const CaptureScreenshotSchema = createStatelessSchema(
      z.object({
        url: z.string().url(),
        screenshot_wait_for: z.number().optional(),
        save_to_directory: z.string().optional().describe('Local directory to save screenshot file'),
        // output_path not exposed as MCP needs base64 data
      }),
      'capture_screenshot',
    );
  • src/server.ts:829-835 (registration)
    MCP server switch case registration: Routes 'capture_screenshot' tool calls to contentHandlers.captureScreenshot with schema validation.
    case 'capture_screenshot':
      return await this.validateAndExecute(
        'capture_screenshot',
        args,
        CaptureScreenshotSchema,
        async (validatedArgs) => this.contentHandlers.captureScreenshot(validatedArgs),
      );
  • Service client helper: Proxies HTTP POST to backend /screenshot endpoint to capture and return base64 screenshot data.
    async captureScreenshot(options: ScreenshotEndpointOptions): Promise<ScreenshotEndpointResponse> {
      // Validate URL
      if (!validateURL(options.url)) {
        throw new Error('Invalid URL format');
      }
    
      try {
        const response = await this.axiosClient.post('/screenshot', {
          url: options.url,
          screenshot_wait_for: options.screenshot_wait_for,
          // output_path is omitted to get base64 response
        });
    
        return response.data;
      } catch (error) {
        return handleAxiosError(error);
      }
    }
  • src/server.ts:151-173 (registration)
    Tool metadata registration in MCP server.tools.list array, including name, description, and JSON input schema.
    name: 'capture_screenshot',
    description:
      "[STATELESS] Capture webpage screenshot. Returns base64-encoded PNG data. Creates new browser each time. Optionally saves screenshot to local directory. IMPORTANT: Chained calls (execute_js then capture_screenshot) will NOT work - the screenshot won't see JS changes! For JS changes + screenshot use create_session + crawl(session_id, js_code, screenshot:true) in ONE call.",
    inputSchema: {
      type: 'object',
      properties: {
        url: {
          type: 'string',
          description: 'The URL to capture',
        },
        screenshot_wait_for: {
          type: 'number',
          description: 'Seconds to wait before taking screenshot (allows page loading/animations)',
          default: 2,
        },
        save_to_directory: {
          type: 'string',
          description:
            "Directory path to save screenshot (e.g., ~/Desktop, /tmp). Do NOT include filename - it will be auto-generated. Large screenshots (>800KB) won't be returned inline when saved.",
        },
      },
      required: ['url'],
    },
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively communicates several key behaviors: stateless operation ('Creates new browser each time'), return format ('Returns base64-encoded PNG data'), performance consideration ('Large screenshots (>800KB) won't be returned inline when saved'), and a critical limitation about JavaScript changes. The only gap is lack of information about error conditions or timeout behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded with the core functionality. The first sentence establishes the main purpose, followed by important behavioral details. The warning about chained calls is crucial but could be slightly more concise. Overall, most sentences earn their place by conveying essential information that isn't in the structured fields.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a screenshot tool with browser isolation and JavaScript limitations, and with no annotations or output schema, the description does a good job covering key aspects: stateless operation, return format, sibling tool relationships, and important constraints. The main gap is the lack of information about what happens on failure (timeouts, invalid URLs, etc.), which prevents a perfect score.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description adds minimal parameter-specific information beyond what's in the schema - it mentions the save_to_directory behavior with large screenshots, but doesn't provide additional context about url validation or screenshot_wait_for usage. This meets the baseline expectation when schema coverage is complete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('capture webpage screenshot'), the resource ('webpage'), and distinguishes it from siblings by explicitly mentioning what it doesn't do (chained calls with execute_js) and pointing to the alternative (create_session + crawl). The verb+resource combination is precise and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when NOT to use this tool ('Chained calls... will NOT work') and specifies the alternative approach ('use create_session + crawl... in ONE call'). It also clarifies the stateless nature upfront, which helps set expectations about browser isolation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/omgwtfwow/mcp-crawl4ai-ts'

If you have feedback or need assistance with the MCP directory API, please join our Discord server