agentic_scrapper

Execute complex multi-step web scraping workflows with AI-powered automation.

This tool runs an intelligent agent that can navigate websites, interact with forms and buttons, follow multi-step workflows, and extract structured data. Ideal for complex scraping scenarios requiring user interaction simulation, form submissions, or multi-page navigation flows. Supports custom output schemas and step-by-step instructions. Variable credit cost based on complexity. Can perform actions on the website (non-read-only, non-idempotent).

The agent accepts flexible input formats for steps (list or JSON string) and output_schema (dict or JSON string) to accommodate different client implementations.

Args: url (str): The target website URL where the agentic scraping workflow should start. - Must include protocol (http:// or https://) - Should be the starting page for your automation workflow - The agent will begin its actions from this URL - Examples: * https://example.com/search (start at search page) * https://shop.example.com/login (begin with login flow) * https://app.example.com/dashboard (start at main interface) * https://forms.example.com/contact (begin at form page) - Considerations: * Choose a starting point that makes sense for your workflow * Ensure the page is publicly accessible or handle authentication * Consider the logical flow of actions from this starting point

user_prompt (Optional[str]): High-level instructions for what the agent should accomplish.
    - Describes the overall goal and desired outcome of the automation
    - Should be clear and specific about what you want to achieve
    - Works in conjunction with the steps parameter for detailed guidance
    - Examples:
      * "Navigate to the search page, search for laptops, and extract the top 5 results with prices"
      * "Fill out the contact form with sample data and submit it"
      * "Login to the dashboard and extract all recent notifications"
      * "Browse the product catalog and collect information about all items"
      * "Navigate through the multi-step checkout process and capture each step"
    - Tips for better results:
      * Be specific about the end goal
      * Mention what data you want extracted
      * Include context about the expected workflow
      * Specify any particular elements or sections to focus on

output_schema (Optional[Union[str, Dict]]): Desired output structure for extracted data.
    - Can be provided as a dictionary or JSON string
    - Defines the format and structure of the final extracted data
    - Helps ensure consistent, predictable output format
    - IMPORTANT: Must include a "required" field (can be empty array [] if no fields are required)
    - Examples:
      * Simple object: {'type': 'object', 'properties': {'title': {'type': 'string'}, 'price': {'type': 'number'}}, 'required': []}
      * Array of objects: {'type': 'array', 'items': {'type': 'object', 'properties': {'name': {'type': 'string'}, 'value': {'type': 'string'}}, 'required': []}, 'required': []}
      * Complex nested: {'type': 'object', 'properties': {'products': {'type': 'array', 'items': {...}}, 'total_count': {'type': 'number'}}, 'required': []}
      * As JSON string: '{"type": "object", "properties": {"results": {"type": "array"}}, "required": []}'
      * With required fields: {'type': 'object', 'properties': {'id': {'type': 'string'}, 'name': {'type': 'string'}}, 'required': ['id']}
    - Note: If "required" field is missing, it will be automatically added as an empty array []
    - Default: None (agent will infer structure from prompt and steps)

steps (Optional[Union[str, List[str]]]): Step-by-step instructions for the agent.
    - Can be provided as a list of strings or JSON array string
    - Provides detailed, sequential instructions for the automation workflow
    - Each step should be a clear, actionable instruction
    - Examples as list:
      * ['Click the search button', 'Enter "laptops" in the search box', 'Press Enter', 'Wait for results to load', 'Extract product information']
      * ['Fill in email field with test@example.com', 'Fill in password field', 'Click login button', 'Navigate to profile page']
    - Examples as JSON string:
      * '["Open navigation menu", "Click on Products", "Select category filters", "Extract all product data"]'
    - Best practices:
      * Break complex actions into simple steps
      * Be specific about UI elements (button text, field names, etc.)
      * Include waiting/loading steps when necessary
      * Specify extraction points clearly
      * Order steps logically for the workflow

ai_extraction (Optional[bool]): Enable AI-powered extraction mode for intelligent data parsing.
    - Default: true (recommended for most use cases)
    - Options:
      * true: Uses advanced AI to intelligently extract and structure data
        - Better at handling complex page layouts
        - Can adapt to different content structures
        - Provides more accurate data extraction
        - Recommended for most scenarios
      * false: Uses simpler extraction methods
        - Faster processing but less intelligent
        - May miss complex or nested data
        - Use when speed is more important than accuracy
    - Performance impact:
      * true: Higher processing time but better results
      * false: Faster execution but potentially less accurate extraction

persistent_session (Optional[bool]): Maintain session state between steps.
    - Default: false (each step starts fresh)
    - Options:
      * true: Keeps cookies, login state, and session data between steps
        - Essential for authenticated workflows
        - Maintains shopping cart contents, user preferences, etc.
        - Required for multi-step processes that depend on previous actions
        - Use for: Login flows, shopping processes, form wizards
      * false: Each step starts with a clean session
        - Faster and simpler for independent actions
        - No state carried between steps
        - Use for: Simple data extraction, public content scraping
    - Examples when to use true:
      * Login → Navigate to protected area → Extract data
      * Add items to cart → Proceed to checkout → Extract order details
      * Multi-step form completion with session dependencies

timeout_seconds (Optional[float]): Maximum time to wait for the entire workflow.
    - Default: 120 seconds (2 minutes)
    - Recommended ranges:
      * 60-120: Simple workflows (2-5 steps)
      * 180-300: Medium complexity (5-10 steps)
      * 300-600: Complex workflows (10+ steps or slow sites)
      * 600+: Very complex or slow-loading workflows
    - Considerations:
      * Include time for page loads, form submissions, and processing
      * Factor in network latency and site response times
      * Allow extra time for AI processing and extraction
      * Balance between thoroughness and efficiency
    - Examples:
      * 60.0: Quick single-page data extraction
      * 180.0: Multi-step form filling and submission
      * 300.0: Complex navigation and comprehensive data extraction
      * 600.0: Extensive workflows with multiple page interactions

Returns: Dictionary containing: - extracted_data: The structured data matching your prompt and optional schema - workflow_log: Detailed log of all actions performed by the agent - pages_visited: List of URLs visited during the workflow - actions_performed: Summary of interactions (clicks, form fills, navigations) - execution_time: Total time taken for the workflow - steps_completed: Number of steps successfully executed - final_page_url: The URL where the workflow ended - session_data: Session information if persistent_session was enabled - credits_used: Number of credits consumed (varies by complexity) - status: Success/failure status with any error details

Raises: ValueError: If URL is malformed or required parameters are missing TimeoutError: If the workflow exceeds the specified timeout NavigationError: If the agent cannot navigate to required pages InteractionError: If the agent cannot interact with specified elements ExtractionError: If data extraction fails or returns invalid results

Use Cases: - Automated form filling and submission - Multi-step checkout processes - Login-protected content extraction - Interactive search and filtering workflows - Complex navigation scenarios requiring user simulation - Data collection from dynamic, JavaScript-heavy applications

Best Practices: - Start with simple workflows and gradually increase complexity - Use specific element identifiers in steps (button text, field labels) - Include appropriate wait times for page loads and dynamic content - Test with persistent_session=true for authentication-dependent workflows - Set realistic timeouts based on workflow complexity - Provide clear, sequential steps that build on each other - Use output_schema to ensure consistent data structure

Note: - This tool can perform actions on websites (non-read-only) - Results may vary between runs due to dynamic content (non-idempotent) - Credit cost varies based on workflow complexity and execution time - Some websites may have anti-automation measures that could affect success - Consider using simpler tools (smartscraper, markdownify) for basic extraction needs

Name	Required	Description	Default
`url`	Yes
`user_prompt`	No
`output_schema`	No
`steps`	No
`ai_extraction`	No
`persistent_session`	No
`timeout_seconds`	No

ScrapeGraph MCP Server

Instructions

Input Schema

Output Schema

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API