Skip to main content
Glama
geosp
by geosp

get_single_web_page_content

Extract full text content from a specific web page URL for analysis or reference, with optional character limit control.

Instructions

Extract and return the full content from a single web page URL.

Use this when you have a specific URL and need the full text content for analysis or reference.

Args: url: The URL of the web page to extract content from max_content_length: Maximum characters for the extracted content (0 = no limit)

Returns: Formatted text containing the extracted page content with word count

Parameter Usage Guidelines

url (required)

  • Must be a valid HTTP or HTTPS URL

  • Include the full URL with protocol (http:// or https://)

  • Examples:

    • "https://example.com/article"

    • "https://docs.python.org/3/library/asyncio.html"

    • "https://github.com/user/repo/blob/main/README.md"

max_content_length (optional, default unlimited)

  • Limits the extracted content to specified character count

  • Common values: 10000 (summaries), 50000 (full pages), null (no limit)

Usage Examples

Basic content extraction:

{
  "url": "https://example.com/blog/ai-trends-2024"
}

Extract with content limit:

{
  "url": "https://docs.example.com/api-reference",
  "max_content_length": 20000
}

Extract documentation:

{
  "url": "https://github.com/project/docs/installation.md",
  "max_content_length": 10000
}

Extract complete article:

{
  "url": "https://techblog.com/comprehensive-guide"
}

Complete parameter example:

{
  "url": "https://docs.python.org/3/library/asyncio.html",
  "max_content_length": 50000
}

When to Choose This Tool

  • Choose this when you have a specific URL from search results or references

  • Choose this for extracting content from documentation, articles, or blog posts

  • Choose this when you need to analyze or reference specific webpage content

  • Choose this for following up on URLs found in search results

  • Choose this when extracting content from GitHub README files or documentation

Error Handling

  • If URL is inaccessible, an error message will be provided

  • Some sites may block automated access - try alternative URLs

  • Dynamic content may require multiple attempts

  • Large pages may timeout - use content length limits

Alternative Tools

  • Use full_web_search when you need to find relevant pages first

  • Use get_web_search_summaries for discovering URLs to extract

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes
max_content_lengthNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • MCP tool handler function that executes the tool logic: logs input, calls WebSearchService.extract_single_page, formats response with content and word count, returns MCP-formatted content block.
    @mcp.tool()
    @inject_docstring(lambda: load_instruction("instructions_single_page.md", __file__))
    async def get_single_web_page_content(url: str, max_content_length: int = None) -> Dict[str, Any]:
        """Extract content from a single webpage"""
        try:
            logger.info(f"MCP tool get_single_web_page_content: url='{url}'")
    
            content = await web_search_service.extract_single_page(
                url=url,
                max_content_length=max_content_length
            )
    
            word_count = len(content.split())
    
            response_text = f"**Page Content from: {url}**\n\n{content}\n\n"
            response_text += f"**Word count:** {word_count}\n"
    
            logger.info(f"MCP tool get_single_web_page_content completed: {word_count} words")
    
            return {
                "content": [{"type": "text", "text": response_text}]
            }
    
        except Exception as e:
            logger.error(f"MCP tool get_single_web_page_content error: {e}")
            raise
  • Function to register all web search MCP tools, including get_single_web_page_content, by defining them with @mcp.tool() decorators inside.
    def register_tool(mcp: FastMCP, web_search_service: WebSearchService) -> None:
        """
        Register web search tools with the MCP server
    
        Args:
            mcp: FastMCP server instance
            web_search_service: WebSearchService instance
        """
  • Pydantic schema defining input parameters for single page content extraction (url, max_content_length), matching the tool signature.
    class SinglePageRequest(BaseModel):
        """Request model for single page content extraction"""
        url: str = Field(..., description="URL to extract content from")
        max_content_length: Optional[int] = Field(default=None, ge=0,
                                                description="Maximum content length")
  • Core service method implementing single page content extraction by delegating to _extract_page_content.
    async def extract_single_page(self, url: str, max_content_length: Optional[int] = None) -> str:
        """Extract content from a single webpage"""
        logger.info(f"Extracting content from: {url}")
    
        return await self._extract_page_content(url, max_content_length)
  • Main content extraction logic: attempts fast HTTP extraction with BeautifulSoup, falls back to Playwright browser rendering for dynamic content.
    async def _extract_page_content(self, url: str, max_content_length: Optional[int]) -> str:
        """Extract readable content from a webpage"""
        try:
            # Try fast HTTP extraction first
            content = await self._extract_with_httpx(url, max_content_length)
            if self._is_meaningful_content(content):
                return content
        except Exception as e:
            logger.debug(f"HTTP extraction failed for {url}: {e}")
    
        # Fallback to browser extraction
        return await self._extract_with_browser(url, max_content_length)

Tool Definition Quality

Score is being calculated. Check back soon.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/geosp/mcp-mixsearch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server