Skip to main content
Glama
geosp
by geosp

get_single_web_page_content

Extract full text content from a specific web page URL for analysis or reference, with optional character limit control.

Instructions

Extract and return the full content from a single web page URL.

Use this when you have a specific URL and need the full text content for analysis or reference.

Args: url: The URL of the web page to extract content from max_content_length: Maximum characters for the extracted content (0 = no limit)

Returns: Formatted text containing the extracted page content with word count

Parameter Usage Guidelines

url (required)

  • Must be a valid HTTP or HTTPS URL

  • Include the full URL with protocol (http:// or https://)

  • Examples:

    • "https://example.com/article"

    • "https://docs.python.org/3/library/asyncio.html"

    • "https://github.com/user/repo/blob/main/README.md"

max_content_length (optional, default unlimited)

  • Limits the extracted content to specified character count

  • Common values: 10000 (summaries), 50000 (full pages), null (no limit)

Usage Examples

Basic content extraction:

{
  "url": "https://example.com/blog/ai-trends-2024"
}

Extract with content limit:

{
  "url": "https://docs.example.com/api-reference",
  "max_content_length": 20000
}

Extract documentation:

{
  "url": "https://github.com/project/docs/installation.md",
  "max_content_length": 10000
}

Extract complete article:

{
  "url": "https://techblog.com/comprehensive-guide"
}

Complete parameter example:

{
  "url": "https://docs.python.org/3/library/asyncio.html",
  "max_content_length": 50000
}

When to Choose This Tool

  • Choose this when you have a specific URL from search results or references

  • Choose this for extracting content from documentation, articles, or blog posts

  • Choose this when you need to analyze or reference specific webpage content

  • Choose this for following up on URLs found in search results

  • Choose this when extracting content from GitHub README files or documentation

Error Handling

  • If URL is inaccessible, an error message will be provided

  • Some sites may block automated access - try alternative URLs

  • Dynamic content may require multiple attempts

  • Large pages may timeout - use content length limits

Alternative Tools

  • Use full_web_search when you need to find relevant pages first

  • Use get_web_search_summaries for discovering URLs to extract

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes
max_content_lengthNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • MCP tool handler function that executes the tool logic: logs input, calls WebSearchService.extract_single_page, formats response with content and word count, returns MCP-formatted content block.
    @mcp.tool()
    @inject_docstring(lambda: load_instruction("instructions_single_page.md", __file__))
    async def get_single_web_page_content(url: str, max_content_length: int = None) -> Dict[str, Any]:
        """Extract content from a single webpage"""
        try:
            logger.info(f"MCP tool get_single_web_page_content: url='{url}'")
    
            content = await web_search_service.extract_single_page(
                url=url,
                max_content_length=max_content_length
            )
    
            word_count = len(content.split())
    
            response_text = f"**Page Content from: {url}**\n\n{content}\n\n"
            response_text += f"**Word count:** {word_count}\n"
    
            logger.info(f"MCP tool get_single_web_page_content completed: {word_count} words")
    
            return {
                "content": [{"type": "text", "text": response_text}]
            }
    
        except Exception as e:
            logger.error(f"MCP tool get_single_web_page_content error: {e}")
            raise
  • Function to register all web search MCP tools, including get_single_web_page_content, by defining them with @mcp.tool() decorators inside.
    def register_tool(mcp: FastMCP, web_search_service: WebSearchService) -> None:
        """
        Register web search tools with the MCP server
    
        Args:
            mcp: FastMCP server instance
            web_search_service: WebSearchService instance
        """
  • Pydantic schema defining input parameters for single page content extraction (url, max_content_length), matching the tool signature.
    class SinglePageRequest(BaseModel):
        """Request model for single page content extraction"""
        url: str = Field(..., description="URL to extract content from")
        max_content_length: Optional[int] = Field(default=None, ge=0,
                                                description="Maximum content length")
  • Core service method implementing single page content extraction by delegating to _extract_page_content.
    async def extract_single_page(self, url: str, max_content_length: Optional[int] = None) -> str:
        """Extract content from a single webpage"""
        logger.info(f"Extracting content from: {url}")
    
        return await self._extract_page_content(url, max_content_length)
  • Main content extraction logic: attempts fast HTTP extraction with BeautifulSoup, falls back to Playwright browser rendering for dynamic content.
    async def _extract_page_content(self, url: str, max_content_length: Optional[int]) -> str:
        """Extract readable content from a webpage"""
        try:
            # Try fast HTTP extraction first
            content = await self._extract_with_httpx(url, max_content_length)
            if self._is_meaningful_content(content):
                return content
        except Exception as e:
            logger.debug(f"HTTP extraction failed for {url}: {e}")
    
        # Fallback to browser extraction
        return await self._extract_with_browser(url, max_content_length)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes what the tool does (extracts content), what it returns (formatted text with word count), and includes important behavioral context like error handling (inaccessible URLs, blocked sites, timeouts) and practical constraints (dynamic content may require multiple attempts). The only minor gap is lack of explicit rate limit or authentication information.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (description, args, returns, parameter guidelines, examples, when to choose, error handling, alternatives). While comprehensive, some sections like the multiple similar examples could be slightly condensed. However, every section adds value and the core purpose is front-loaded effectively.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (web content extraction with potential errors), no annotations, and an output schema present, the description provides complete context. It covers purpose, usage guidelines, parameter details, behavioral traits, error scenarios, and sibling tool differentiation. The presence of an output schema means the description doesn't need to detail return format, allowing it to focus on operational context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 0% schema description coverage, the description provides comprehensive parameter documentation that fully compensates. It explains both parameters in detail: url requirements (valid HTTP/HTTPS, full URL with protocol, examples) and max_content_length behavior (optional, default unlimited, common values, practical use cases). The usage examples further clarify parameter semantics beyond what the bare schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verb ('extract and return') and resource ('full content from a single web page URL'). It distinguishes from sibling tools by focusing on content extraction from a specific URL rather than search functionality, with explicit differentiation in the 'Alternative Tools' section.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('when you have a specific URL and need the full text content for analysis or reference') and when not to use it (via the 'Alternative Tools' section that names specific sibling tools for different use cases). The 'When to Choose This Tool' section further elaborates with concrete scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/geosp/mcp-mixsearch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server