Skip to main content
Glama
by garylab

webpage_scrape

Extract and retrieve webpage content by URL, optionally converting it to Markdown format, enabling LLMs to access and process web-based information.

Instructions

Scrape webpage by url

Input Schema

NameRequiredDescriptionDefault
includeMarkdownNo
urlYes

Input Schema (JSON Schema)

{ "properties": { "includeMarkdown": { "anyOf": [ { "type": "boolean" }, { "type": "null" } ], "default": false, "title": "Includemarkdown" }, "url": { "title": "Url", "type": "string" } }, "required": [ "url" ], "title": "WebpageRequest", "type": "object" }

Implementation Reference

  • Executes the webpage scraping by posting the request to scrape.serper.dev API endpoint using the fetch_json utility.
    async def scape(request: WebpageRequest) -> Dict[str, Any]: url = "https://scrape.serper.dev" return await fetch_json(url, request)
  • Pydantic model defining the input schema for the webpage_scrape tool, including url and optional includeMarkdown flag.
    class WebpageRequest(BaseModel): url: str = Field(..., description="The url to scrape") includeMarkdown: Optional[str] = Field( "false", pattern=r"^(true|false)$", description="Include markdown in the response (boolean value as string: 'true' or 'false')", )
  • Registers the webpage_scrape tool with MCP server in the list_tools handler, providing name, description, and input schema.
    tools.append(Tool( name=SerperTools.WEBPAGE_SCRAPE, description="Scrape webpage by url", inputSchema=WebpageRequest.model_json_schema(), ))
  • Dispatch logic in the call_tool MCP handler that validates arguments with WebpageRequest, calls the scape function, and formats the JSON result as TextContent.
    if name == SerperTools.WEBPAGE_SCRAPE.value: request = WebpageRequest(**arguments) result = await scape(request) return [TextContent(text=json.dumps(result, indent=2), type="text")]
  • Shared utility function that performs the HTTP POST request to Serper APIs with API key, SSL context, and timeout handling.
    async def fetch_json(url: str, request: BaseModel) -> Dict[str, Any]: payload = request.model_dump(exclude_none=True) headers = { 'X-API-KEY': SERPER_API_KEY, 'Content-Type': 'application/json' } ssl_context = ssl.create_default_context(cafile=certifi.where()) connector = aiohttp.TCPConnector(ssl=ssl_context) timeout = aiohttp.ClientTimeout(total=AIOHTTP_TIMEOUT) async with aiohttp.ClientSession(connector=connector, timeout=timeout) as session: async with session.post(url, headers=headers, json=payload) as response: response.raise_for_status() return await response.json()

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/garylab/serper-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server