Skip to main content
Glama
ScrapeGraphAI

ScrapeGraph MCP Server

Official

searchscraper

Read-only

Perform AI-powered web searches to extract structured data from search results for research, competitive analysis, and multi-source information gathering.

Instructions

Perform AI-powered web searches with structured data extraction.

This tool searches the web based on your query and uses AI to extract structured information from the search results. Ideal for research, competitive analysis, and gathering information from multiple sources. Each website searched costs 10 credits (default 3 websites = 30 credits). Read-only operation but results may vary over time (non-idempotent).

Args: user_prompt (str): Search query or natural language instructions for information to find. - Can be a simple search query or detailed extraction instructions - The AI will search the web and extract relevant data from found pages - Be specific about what information you want extracted - Examples: * "Find latest AI research papers published in 2024 with author names and abstracts" * "Search for Python web scraping tutorials with ratings and difficulty levels" * "Get current cryptocurrency prices and market caps for top 10 coins" * "Find contact information for tech startups in San Francisco" * "Search for job openings for data scientists with salary information" - Tips for better results: * Include specific fields you want extracted * Mention timeframes or filters (e.g., "latest", "2024", "top 10") * Specify data types needed (prices, dates, ratings, etc.)

num_results (Optional[int]): Number of websites to search and extract data from.
    - Default: 3 websites (costs 30 credits total)
    - Range: 1-20 websites (recommended to stay under 10 for cost efficiency)
    - Each website costs 10 credits, so total cost = num_results × 10
    - Examples:
      * 1: Quick single-source lookup (10 credits)
      * 3: Standard research (30 credits) - good balance of coverage and cost
      * 5: Comprehensive research (50 credits)
      * 10: Extensive analysis (100 credits)
    - Note: More results provide broader coverage but increase costs and processing time

number_of_scrolls (Optional[int]): Number of infinite scrolls per searched webpage.
    - Default: 0 (no scrolling on search result pages)
    - Range: 0-10 scrolls per page
    - Useful when search results point to pages with dynamic content loading
    - Each scroll waits for content to load before continuing
    - Examples:
      * 0: Static content pages, news articles, documentation
      * 2: Social media pages, product listings with lazy loading
      * 5: Extensive feeds, long-form content with infinite scroll
    - Note: Increases processing time significantly (adds 5-10 seconds per scroll per page)

Returns: Dictionary containing: - search_results: Array of extracted data from each website found - sources: List of URLs that were searched and processed - total_websites_processed: Number of websites successfully analyzed - credits_used: Total credits consumed (num_results × 10) - processing_time: Total time taken for search and extraction - search_query_used: The actual search query sent to search engines - metadata: Additional information about the search process

Raises: ValueError: If user_prompt is empty or num_results is out of range HTTPError: If search engines are unavailable or return errors TimeoutError: If search or extraction process exceeds timeout limits RateLimitError: If too many requests are made in a short time period

Note: - Results may vary between calls due to changing web content (non-idempotent) - Search engines may return different results over time - Some websites may be inaccessible or block automated access - Processing time increases with num_results and number_of_scrolls - Consider using smartscraper on specific URLs if you know the target sites

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
user_promptYes
num_resultsNo
number_of_scrollsNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • MCP tool handler function for 'searchscraper'. Registers the tool, defines input schema via type hints and docstring, handles authentication via get_api_key, instantiates ScapeGraphClient, and delegates to the client's searchscraper method.
    @mcp.tool(annotations={"readOnlyHint": True, "destructiveHint": False, "idempotentHint": False})
    def searchscraper(
        user_prompt: str,
        ctx: Context,
        num_results: Optional[int] = None,
        number_of_scrolls: Optional[int] = None
    ) -> Dict[str, Any]:
        """
        Perform AI-powered web searches with structured data extraction.
    
        This tool searches the web based on your query and uses AI to extract structured information
        from the search results. Ideal for research, competitive analysis, and gathering information
        from multiple sources. Each website searched costs 10 credits (default 3 websites = 30 credits).
        Read-only operation but results may vary over time (non-idempotent).
    
        Args:
            user_prompt (str): Search query or natural language instructions for information to find.
                - Can be a simple search query or detailed extraction instructions
                - The AI will search the web and extract relevant data from found pages
                - Be specific about what information you want extracted
                - Examples:
                  * "Find latest AI research papers published in 2024 with author names and abstracts"
                  * "Search for Python web scraping tutorials with ratings and difficulty levels"
                  * "Get current cryptocurrency prices and market caps for top 10 coins"
                  * "Find contact information for tech startups in San Francisco"
                  * "Search for job openings for data scientists with salary information"
                - Tips for better results:
                  * Include specific fields you want extracted
                  * Mention timeframes or filters (e.g., "latest", "2024", "top 10")
                  * Specify data types needed (prices, dates, ratings, etc.)
    
            num_results (Optional[int]): Number of websites to search and extract data from.
                - Default: 3 websites (costs 30 credits total)
                - Range: 1-20 websites (recommended to stay under 10 for cost efficiency)
                - Each website costs 10 credits, so total cost = num_results × 10
                - Examples:
                  * 1: Quick single-source lookup (10 credits)
                  * 3: Standard research (30 credits) - good balance of coverage and cost
                  * 5: Comprehensive research (50 credits)
                  * 10: Extensive analysis (100 credits)
                - Note: More results provide broader coverage but increase costs and processing time
    
            number_of_scrolls (Optional[int]): Number of infinite scrolls per searched webpage.
                - Default: 0 (no scrolling on search result pages)
                - Range: 0-10 scrolls per page
                - Useful when search results point to pages with dynamic content loading
                - Each scroll waits for content to load before continuing
                - Examples:
                  * 0: Static content pages, news articles, documentation
                  * 2: Social media pages, product listings with lazy loading
                  * 5: Extensive feeds, long-form content with infinite scroll
                - Note: Increases processing time significantly (adds 5-10 seconds per scroll per page)
    
        Returns:
            Dictionary containing:
            - search_results: Array of extracted data from each website found
            - sources: List of URLs that were searched and processed
            - total_websites_processed: Number of websites successfully analyzed
            - credits_used: Total credits consumed (num_results × 10)
            - processing_time: Total time taken for search and extraction
            - search_query_used: The actual search query sent to search engines
            - metadata: Additional information about the search process
    
        Raises:
            ValueError: If user_prompt is empty or num_results is out of range
            HTTPError: If search engines are unavailable or return errors
            TimeoutError: If search or extraction process exceeds timeout limits
            RateLimitError: If too many requests are made in a short time period
    
        Note:
            - Results may vary between calls due to changing web content (non-idempotent)
            - Search engines may return different results over time
            - Some websites may be inaccessible or block automated access
            - Processing time increases with num_results and number_of_scrolls
            - Consider using smartscraper on specific URLs if you know the target sites
        """
        try:
            api_key = get_api_key(ctx)
            client = ScapeGraphClient(api_key)
            return client.searchscraper(user_prompt, num_results, number_of_scrolls)
        except Exception as e:
            return {"error": str(e)}
  • Core implementation of searchscraper in ScapeGraphClient class. Constructs POST request to API endpoint https://api.scrapegraphai.com/v1/searchscraper with user_prompt and optional num_results/number_of_scrolls, handles HTTP response and errors.
    def searchscraper(self, user_prompt: str, num_results: int = None, number_of_scrolls: int = None) -> Dict[str, Any]:
        """
        Perform AI-powered web searches with structured results.
    
        Args:
            user_prompt: Search query or instructions
            num_results: Number of websites to search (optional, default: 3 websites = 30 credits)
            number_of_scrolls: Number of infinite scrolls to perform on each website (optional)
    
        Returns:
            Dictionary containing search results and reference URLs
        """
        url = f"{self.BASE_URL}/searchscraper"
        data = {
            "user_prompt": user_prompt
        }
        
        # Add num_results to the request if provided
        if num_results is not None:
            data["num_results"] = num_results
            
        # Add number_of_scrolls to the request if provided
        if number_of_scrolls is not None:
            data["number_of_scrolls"] = number_of_scrolls
    
        response = self.client.post(url, headers=self.headers, json=data)
    
        if response.status_code != 200:
            error_msg = f"Error {response.status_code}: {response.text}"
            raise Exception(error_msg)
    
        return response.json()
  • FastMCP decorator that registers the searchscraper tool with annotations indicating it's read-only but non-idempotent.
    @mcp.tool(annotations={"readOnlyHint": True, "destructiveHint": False, "idempotentHint": False})
  • Input schema defined by function signature (user_prompt required str, optional num_results and number_of_scrolls ints) and comprehensive docstring describing parameters, constraints, examples, returns, and errors.
    def searchscraper(
        user_prompt: str,
        ctx: Context,
        num_results: Optional[int] = None,
        number_of_scrolls: Optional[int] = None
    ) -> Dict[str, Any]:
        """
        Perform AI-powered web searches with structured data extraction.
    
        This tool searches the web based on your query and uses AI to extract structured information
        from the search results. Ideal for research, competitive analysis, and gathering information
        from multiple sources. Each website searched costs 10 credits (default 3 websites = 30 credits).
        Read-only operation but results may vary over time (non-idempotent).
    
        Args:
            user_prompt (str): Search query or natural language instructions for information to find.
                - Can be a simple search query or detailed extraction instructions
                - The AI will search the web and extract relevant data from found pages
                - Be specific about what information you want extracted
                - Examples:
                  * "Find latest AI research papers published in 2024 with author names and abstracts"
                  * "Search for Python web scraping tutorials with ratings and difficulty levels"
                  * "Get current cryptocurrency prices and market caps for top 10 coins"
                  * "Find contact information for tech startups in San Francisco"
                  * "Search for job openings for data scientists with salary information"
                - Tips for better results:
                  * Include specific fields you want extracted
                  * Mention timeframes or filters (e.g., "latest", "2024", "top 10")
                  * Specify data types needed (prices, dates, ratings, etc.)
    
            num_results (Optional[int]): Number of websites to search and extract data from.
                - Default: 3 websites (costs 30 credits total)
                - Range: 1-20 websites (recommended to stay under 10 for cost efficiency)
                - Each website costs 10 credits, so total cost = num_results × 10
                - Examples:
                  * 1: Quick single-source lookup (10 credits)
                  * 3: Standard research (30 credits) - good balance of coverage and cost
                  * 5: Comprehensive research (50 credits)
                  * 10: Extensive analysis (100 credits)
                - Note: More results provide broader coverage but increase costs and processing time
    
            number_of_scrolls (Optional[int]): Number of infinite scrolls per searched webpage.
                - Default: 0 (no scrolling on search result pages)
                - Range: 0-10 scrolls per page
                - Useful when search results point to pages with dynamic content loading
                - Each scroll waits for content to load before continuing
                - Examples:
                  * 0: Static content pages, news articles, documentation
                  * 2: Social media pages, product listings with lazy loading
                  * 5: Extensive feeds, long-form content with infinite scroll
                - Note: Increases processing time significantly (adds 5-10 seconds per scroll per page)
    
        Returns:
            Dictionary containing:
            - search_results: Array of extracted data from each website found
            - sources: List of URLs that were searched and processed
            - total_websites_processed: Number of websites successfully analyzed
            - credits_used: Total credits consumed (num_results × 10)
            - processing_time: Total time taken for search and extraction
            - search_query_used: The actual search query sent to search engines
            - metadata: Additional information about the search process
    
        Raises:
            ValueError: If user_prompt is empty or num_results is out of range
            HTTPError: If search engines are unavailable or return errors
            TimeoutError: If search or extraction process exceeds timeout limits
            RateLimitError: If too many requests are made in a short time period
    
        Note:
            - Results may vary between calls due to changing web content (non-idempotent)
            - Search engines may return different results over time
            - Some websites may be inaccessible or block automated access
            - Processing time increases with num_results and number_of_scrolls
            - Consider using smartscraper on specific URLs if you know the target sites
        """
        try:
            api_key = get_api_key(ctx)
            client = ScapeGraphClient(api_key)
            return client.searchscraper(user_prompt, num_results, number_of_scrolls)
        except Exception as e:
            return {"error": str(e)}
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds substantial behavioral context beyond annotations: cost structure (10 credits per website), non-idempotence details ('results may vary over time'), processing time implications, accessibility constraints ('some websites may be inaccessible'), and specific error conditions. While annotations cover read-only and non-idempotent hints, the description provides operational details that help the agent make informed decisions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (Args, Returns, Raises, Note) and front-loaded key information. While comprehensive, some sections could be more concise (e.g., multiple similar examples). Every sentence adds value, but the overall length is substantial for a tool description.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (web search with AI extraction, cost structure, multiple parameters) and the presence of output schema, the description is exceptionally complete. It covers purpose, usage, parameters, returns, errors, cost implications, performance considerations, and sibling tool differentiation - providing everything an agent needs to use this tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description fully compensates by providing comprehensive parameter documentation: detailed explanations, examples, tips, ranges, defaults, and cost implications for all three parameters. The description adds significant meaning beyond the bare schema, including practical guidance for better results and trade-offs between parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs 'AI-powered web searches with structured data extraction' - a specific verb (search/extract) and resource (web). It distinguishes from siblings like 'smartscraper' (for specific URLs) and 'scrape' (generic scraping) by emphasizing AI-powered search and extraction from search results.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance is provided: 'Ideal for research, competitive analysis, and gathering information from multiple sources' and 'Consider using smartscraper on specific URLs if you know the target sites.' The description clearly distinguishes when to use this tool (broad web searches) vs. alternatives (targeted scraping with smartscraper).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ScrapeGraphAI/scrapegraph-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server