Skip to main content
Glama

capture_document_pages

Capture multiple document pages automatically by navigating through them with configurable keys and saving screenshots to a specified directory.

Instructions

Capture multiple pages from a document window with automatic navigation.

Args:
    window_id: Window ID containing the document
    page_count: Number of pages to capture
    output_dir: Directory to save captured pages
    navigation_key: Key to press for navigation (Page_Down, Right, space)
    delay_seconds: Delay between navigation and capture

Returns:
    JSON string with capture results and list of captured files.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
window_idYes
page_countYes
output_dirNo
navigation_keyNoPage_Down
delay_secondsNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The handler function for the 'capture_document_pages' tool, registered via @mcp.tool(). It captures multiple pages from a document window, using the window manager's multi-page capture if available, or a fallback loop with individual captures and delays. Returns JSON results with captured file paths and metadata.
    @mcp.tool()
    async def capture_document_pages(
        window_id: str, 
        page_count: int,
        output_dir: Optional[str] = None,
        navigation_key: str = "Page_Down",
        delay_seconds: float = 1.0
    ) -> str:
        """
        Capture multiple pages from a document window with automatic navigation.
        
        Args:
            window_id: Window ID containing the document
            page_count: Number of pages to capture
            output_dir: Directory to save captured pages
            navigation_key: Key to press for navigation (Page_Down, Right, space)
            delay_seconds: Delay between navigation and capture
        
        Returns:
            JSON string with capture results and list of captured files.
        """
        try:
            # Get configured output directory (with backward compatibility)
            config = get_config()
            if output_dir is None:
                if config.should_use_legacy_mode():
                    output_dir = "captures"  # Legacy default
                else:
                    actual_output_dir = get_output_directory()
            else:
                actual_output_dir = get_output_directory(output_dir)
            
            # Convert to string for compatibility with existing manager interface
            output_dir_str = str(actual_output_dir)
            
            # For multi-page capture, use the underlying manager if it supports it
            wm = get_window_manager()
            if hasattr(wm.manager, 'capture_multiple_pages'):
                captured_files = wm.manager.capture_multiple_pages(
                    window_id=window_id,
                    page_count=page_count,
                    output_dir=output_dir_str,
                    navigation_key=navigation_key,
                    delay_seconds=delay_seconds
                )
            else:
                # Fallback: capture individual pages manually
                actual_output_dir.mkdir(parents=True, exist_ok=True)
                
                captured_files = []
                for page_num in range(1, page_count + 1):
                    filename = generate_page_filename(page_num)
                    output_path = actual_output_dir / filename
                    captured_path = wm.capture_window(window_id, str(output_path))
                    captured_files.append(captured_path)
                    
                    # Simple delay between captures (no navigation for Windows apps yet)
                    if page_num < page_count:
                        time.sleep(delay_seconds)
            
            import os
            total_size = sum(os.path.getsize(f) for f in captured_files if os.path.exists(f))
            
            result = {
                "status": "success",
                "window_id": window_id,
                "pages_captured": len(captured_files),
                "output_directory": output_dir_str,
                "captured_files": captured_files,
                "total_size_mb": round(total_size / (1024 * 1024), 2)
            }
            
            return json.dumps(result, indent=2)
            
        except Exception as e:
            logger.error(f"Failed to capture document pages: {e}")
            return json.dumps({
                "status": "error",
                "error": str(e),
                "window_id": window_id,
                "page_count": page_count,
                "captured_files": []
            })
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes the automatic navigation behavior and mentions the return format (JSON string with results and file list), which is valuable. However, it doesn't disclose important behavioral traits like whether this requires specific permissions, what happens if navigation fails, whether it modifies the document, or any rate limits. The description adds some context but leaves significant gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured and concise: a clear purpose statement followed by well-organized Args and Returns sections. Every sentence earns its place, with no wasted words. The information is front-loaded with the core functionality, followed by parameter details and return information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, automatic navigation behavior) and the presence of an output schema (which handles return values), the description is reasonably complete. It covers the core functionality, all parameters, and mentions the return format. However, for a tool with no annotations and significant behavioral complexity, it could benefit from more disclosure about error conditions, permissions, or what 'automatic navigation' entails beyond key pressing.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate for the lack of parameter documentation in the schema. It provides clear explanations for all 5 parameters in the Args section, adding meaningful context about what each parameter does (e.g., 'Key to press for navigation', 'Delay between navigation and capture'). This significantly enhances understanding beyond the bare schema, though it doesn't cover all possible edge cases or format details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('capture multiple pages', 'automatic navigation') and distinguishes it from siblings like capture_full_screen and capture_window by focusing on document pages with navigation. It explicitly mentions the resource (document window) and the multi-page capture functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (capturing multiple pages from a document with automatic navigation), but doesn't explicitly state when not to use it or name specific alternatives. It implies usage for document page capture with navigation, which is helpful but lacks explicit exclusions or comparisons to siblings like capture_window or full_document_workflow.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PovedaAqui/auto-snap-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server