get_single_web_page_content
Extract full text content from a specific web page URL for analysis or reference, with optional character limit control.
Instructions
Extract and return the full content from a single web page URL.
Use this when you have a specific URL and need the full text content for analysis or reference.
Args: url: The URL of the web page to extract content from max_content_length: Maximum characters for the extracted content (0 = no limit)
Returns: Formatted text containing the extracted page content with word count
Parameter Usage Guidelines
url (required)
Must be a valid HTTP or HTTPS URL
Include the full URL with protocol (http:// or https://)
Examples:
"https://example.com/article"
"https://docs.python.org/3/library/asyncio.html"
"https://github.com/user/repo/blob/main/README.md"
max_content_length (optional, default unlimited)
Limits the extracted content to specified character count
Common values:
10000(summaries),50000(full pages),null(no limit)
Usage Examples
Basic content extraction:
Extract with content limit:
Extract documentation:
Extract complete article:
Complete parameter example:
When to Choose This Tool
Choose this when you have a specific URL from search results or references
Choose this for extracting content from documentation, articles, or blog posts
Choose this when you need to analyze or reference specific webpage content
Choose this for following up on URLs found in search results
Choose this when extracting content from GitHub README files or documentation
Error Handling
If URL is inaccessible, an error message will be provided
Some sites may block automated access - try alternative URLs
Dynamic content may require multiple attempts
Large pages may timeout - use content length limits
Alternative Tools
Use
full_web_searchwhen you need to find relevant pages firstUse
get_web_search_summariesfor discovering URLs to extract
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| url | Yes | ||
| max_content_length | No |
Implementation Reference
- features/web_search/tool.py:156-181 (handler)MCP tool handler function that executes the tool logic: logs input, calls WebSearchService.extract_single_page, formats response with content and word count, returns MCP-formatted content block.@mcp.tool() @inject_docstring(lambda: load_instruction("instructions_single_page.md", __file__)) async def get_single_web_page_content(url: str, max_content_length: int = None) -> Dict[str, Any]: """Extract content from a single webpage""" try: logger.info(f"MCP tool get_single_web_page_content: url='{url}'") content = await web_search_service.extract_single_page( url=url, max_content_length=max_content_length ) word_count = len(content.split()) response_text = f"**Page Content from: {url}**\n\n{content}\n\n" response_text += f"**Word count:** {word_count}\n" logger.info(f"MCP tool get_single_web_page_content completed: {word_count} words") return { "content": [{"type": "text", "text": response_text}] } except Exception as e: logger.error(f"MCP tool get_single_web_page_content error: {e}") raise
- features/web_search/tool.py:17-24 (registration)Function to register all web search MCP tools, including get_single_web_page_content, by defining them with @mcp.tool() decorators inside.def register_tool(mcp: FastMCP, web_search_service: WebSearchService) -> None: """ Register web search tools with the MCP server Args: mcp: FastMCP server instance web_search_service: WebSearchService instance """
- features/web_search/models.py:66-71 (schema)Pydantic schema defining input parameters for single page content extraction (url, max_content_length), matching the tool signature.class SinglePageRequest(BaseModel): """Request model for single page content extraction""" url: str = Field(..., description="URL to extract content from") max_content_length: Optional[int] = Field(default=None, ge=0, description="Maximum content length")
- Core service method implementing single page content extraction by delegating to _extract_page_content.async def extract_single_page(self, url: str, max_content_length: Optional[int] = None) -> str: """Extract content from a single webpage""" logger.info(f"Extracting content from: {url}") return await self._extract_page_content(url, max_content_length)
- Main content extraction logic: attempts fast HTTP extraction with BeautifulSoup, falls back to Playwright browser rendering for dynamic content.async def _extract_page_content(self, url: str, max_content_length: Optional[int]) -> str: """Extract readable content from a webpage""" try: # Try fast HTTP extraction first content = await self._extract_with_httpx(url, max_content_length) if self._is_meaningful_content(content): return content except Exception as e: logger.debug(f"HTTP extraction failed for {url}: {e}") # Fallback to browser extraction return await self._extract_with_browser(url, max_content_length)