Skip to main content
Glama

get_single_web_page_content

Extract full text content from a specific web page URL for analysis or reference, with optional character limit control.

Instructions

Extract and return the full content from a single web page URL.

Use this when you have a specific URL and need the full text content for analysis or reference.

Args: url: The URL of the web page to extract content from max_content_length: Maximum characters for the extracted content (0 = no limit)

Returns: Formatted text containing the extracted page content with word count

Parameter Usage Guidelines

url (required)

  • Must be a valid HTTP or HTTPS URL

  • Include the full URL with protocol (http:// or https://)

  • Examples:

    • "https://example.com/article"

    • "https://docs.python.org/3/library/asyncio.html"

    • "https://github.com/user/repo/blob/main/README.md"

max_content_length (optional, default unlimited)

  • Limits the extracted content to specified character count

  • Common values: 10000 (summaries), 50000 (full pages), null (no limit)

Usage Examples

Basic content extraction:

{ "url": "https://example.com/blog/ai-trends-2024" }

Extract with content limit:

{ "url": "https://docs.example.com/api-reference", "max_content_length": 20000 }

Extract documentation:

{ "url": "https://github.com/project/docs/installation.md", "max_content_length": 10000 }

Extract complete article:

{ "url": "https://techblog.com/comprehensive-guide" }

Complete parameter example:

{ "url": "https://docs.python.org/3/library/asyncio.html", "max_content_length": 50000 }

When to Choose This Tool

  • Choose this when you have a specific URL from search results or references

  • Choose this for extracting content from documentation, articles, or blog posts

  • Choose this when you need to analyze or reference specific webpage content

  • Choose this for following up on URLs found in search results

  • Choose this when extracting content from GitHub README files or documentation

Error Handling

  • If URL is inaccessible, an error message will be provided

  • Some sites may block automated access - try alternative URLs

  • Dynamic content may require multiple attempts

  • Large pages may timeout - use content length limits

Alternative Tools

  • Use full_web_search when you need to find relevant pages first

  • Use get_web_search_summaries for discovering URLs to extract

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes
max_content_lengthNo

Implementation Reference

  • MCP tool handler function that executes the tool logic: logs input, calls WebSearchService.extract_single_page, formats response with content and word count, returns MCP-formatted content block.
    @mcp.tool() @inject_docstring(lambda: load_instruction("instructions_single_page.md", __file__)) async def get_single_web_page_content(url: str, max_content_length: int = None) -> Dict[str, Any]: """Extract content from a single webpage""" try: logger.info(f"MCP tool get_single_web_page_content: url='{url}'") content = await web_search_service.extract_single_page( url=url, max_content_length=max_content_length ) word_count = len(content.split()) response_text = f"**Page Content from: {url}**\n\n{content}\n\n" response_text += f"**Word count:** {word_count}\n" logger.info(f"MCP tool get_single_web_page_content completed: {word_count} words") return { "content": [{"type": "text", "text": response_text}] } except Exception as e: logger.error(f"MCP tool get_single_web_page_content error: {e}") raise
  • Function to register all web search MCP tools, including get_single_web_page_content, by defining them with @mcp.tool() decorators inside.
    def register_tool(mcp: FastMCP, web_search_service: WebSearchService) -> None: """ Register web search tools with the MCP server Args: mcp: FastMCP server instance web_search_service: WebSearchService instance """
  • Pydantic schema defining input parameters for single page content extraction (url, max_content_length), matching the tool signature.
    class SinglePageRequest(BaseModel): """Request model for single page content extraction""" url: str = Field(..., description="URL to extract content from") max_content_length: Optional[int] = Field(default=None, ge=0, description="Maximum content length")
  • Core service method implementing single page content extraction by delegating to _extract_page_content.
    async def extract_single_page(self, url: str, max_content_length: Optional[int] = None) -> str: """Extract content from a single webpage""" logger.info(f"Extracting content from: {url}") return await self._extract_page_content(url, max_content_length)
  • Main content extraction logic: attempts fast HTTP extraction with BeautifulSoup, falls back to Playwright browser rendering for dynamic content.
    async def _extract_page_content(self, url: str, max_content_length: Optional[int]) -> str: """Extract readable content from a webpage""" try: # Try fast HTTP extraction first content = await self._extract_with_httpx(url, max_content_length) if self._is_meaningful_content(content): return content except Exception as e: logger.debug(f"HTTP extraction failed for {url}: {e}") # Fallback to browser extraction return await self._extract_with_browser(url, max_content_length)

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/geosp/mcp-mixsearch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server