Skip to main content
Glama
blazickjp

web-browser-mcp-server

browse_webpage

Extract webpage content by providing a URL, with optional CSS selectors to target specific elements for precise data retrieval.

Instructions

Extract content from a webpage with optional CSS selectors for specific elements

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL of the webpage to browse
selectorsNoOptional CSS selectors to extract specific content

Implementation Reference

  • The @server.call_tool() decorated function that handles calls to 'browse_webpage'. It validates the tool name, fetches the webpage using aiohttp, parses HTML with BeautifulSoup, extracts title, text, links, applies CSS selectors if provided, and returns the result or error messages.
    @server.call_tool()
    async def call_tool(name: str, arguments: Dict[str, Any]) -> List[types.TextContent]:
        """
        Handle tool calls for web browsing functionality.
    
        Args:
            name (str): The name of the tool to call (must be 'browse_webpage')
            arguments (Dict[str, Any]): Tool arguments including 'url' and optional 'selectors'
    
        Returns:
            List[types.TextContent]: The extracted webpage content or error message
    
        The function performs the following steps:
        1. Validates the tool name
        2. Fetches the webpage content with configured timeout and user agent
        3. Parses the HTML using BeautifulSoup
        4. Extracts basic page information (title, text, links)
        5. Applies any provided CSS selectors for specific content
        6. Handles various error conditions (timeout, HTTP errors, etc.)
        """
        if name != "browse_webpage":
            return [types.TextContent(type="text", text=f"Error: Unknown tool {name}")]
    
        url = arguments["url"]
        selectors = arguments.get("selectors", {})
    
        async with aiohttp.ClientSession() as session:
            try:
                headers = {"User-Agent": settings.USER_AGENT}
                timeout = aiohttp.ClientTimeout(total=settings.REQUEST_TIMEOUT)
    
                async with session.get(url, headers=headers, timeout=timeout) as response:
                    if response.status >= 400:
                        return [
                            types.TextContent(
                                type="text",
                                text=f"Error: HTTP {response.status} - Failed to fetch webpage",
                            )
                        ]
    
                    html = await response.text()
                    soup = BeautifulSoup(html, "html.parser")
    
                    # Extract basic page information
                    result = {
                        "title": soup.title.string if soup.title else None,
                        "text": soup.get_text(strip=True),
                        "links": [
                            {"text": link.text.strip(), "href": link.get("href")}
                            for link in soup.find_all("a", href=True)
                        ],
                    }
    
                    # Extract content using provided selectors
                    if selectors:
                        for key, selector in selectors.items():
                            elements = soup.select(selector)
                            result[key] = [elem.get_text(strip=True) for elem in elements]
    
                    return [types.TextContent(type="text", text=str(result))]
    
            except asyncio.TimeoutError:
                return [
                    types.TextContent(
                        type="text", text="Error: Request timed out while fetching webpage"
                    )
                ]
            except aiohttp.ClientError as e:
                return [types.TextContent(type="text", text=f"Error: {str(e)}")]
            except Exception as e:
                return [types.TextContent(type="text", text=f"Error: {str(e)}")]
  • The @server.list_tools() function that registers the 'browse_webpage' tool by returning a list containing its Tool definition.
    @server.list_tools()
    async def list_tools() -> List[types.Tool]:
        """
        List available web browsing tools.
    
        Returns:
            List[types.Tool]: A list containing the browse_webpage tool definition
        """
        return [
            types.Tool(
                name="browse_webpage",
                description="Extract content from a webpage with optional CSS selectors for specific elements",
                inputSchema={
                    "type": "object",
                    "properties": {
                        "url": {
                            "type": "string",
                            "description": "The URL of the webpage to browse",
                        },
                        "selectors": {
                            "type": "object",
                            "additionalProperties": {"type": "string"},
                            "description": "Optional CSS selectors to extract specific content",
                        },
                    },
                    "required": ["url"],
                },
            )
        ]
  • The inputSchema defining the parameters for 'browse_webpage': required 'url' (string) and optional 'selectors' (object mapping selector names to CSS selectors).
    inputSchema={
        "type": "object",
        "properties": {
            "url": {
                "type": "string",
                "description": "The URL of the webpage to browse",
            },
            "selectors": {
                "type": "object",
                "additionalProperties": {"type": "string"},
                "description": "Optional CSS selectors to extract specific content",
            },
        },
        "required": ["url"],
    },
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool extracts content but fails to describe critical behaviors such as error handling (e.g., invalid URLs, network issues), performance traits (e.g., timeouts, rate limits), or output format. This leaves significant gaps in understanding how the tool operates beyond its basic function.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose. It avoids redundancy and wastes no words, though it could be slightly more informative without sacrificing brevity. The structure is clear and direct, earning a high score for conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (web scraping with optional selectors), lack of annotations, and no output schema, the description is incomplete. It doesn't explain what 'content' includes (e.g., text, links, structure), how selectors are applied, or potential limitations (e.g., JavaScript-rendered content). This leaves the agent with insufficient context for reliable use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters ('url' and 'selectors') adequately. The description adds minimal value by mentioning 'optional CSS selectors for specific elements,' which aligns with the schema but doesn't provide additional syntax, examples, or constraints. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Extract') and resource ('content from a webpage'), and mentions optional CSS selectors for refinement. It distinguishes the tool's core function effectively, though without sibling tools, differentiation isn't applicable. However, it lacks specificity about what 'content' entails (e.g., text, HTML, metadata), which prevents a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, prerequisites, or limitations. It mentions optional CSS selectors but doesn't explain when they are beneficial or necessary. With no sibling tools, context for usage is minimal, but the absence of any usage context results in a low score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/blazickjp/web-browser-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server