Skip to main content
Glama
billallison

URL Text Fetcher MCP Server

by billallison

fetch_url_text

Extract and download all visible text from any specified URL using URL Text Fetcher MCP Server. Ideal for retrieving web content for analysis or processing.

Instructions

Download all visible text from a URL.

Args: url: The URL to fetch text from

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYes

Implementation Reference

  • Primary MCP tool handler for fetch_url_text: sanitizes input URL and calls core fetch_url_content helper to extract visible text.
    @mcp.tool() async def fetch_url_text(url: str) -> str: """Download all visible text from a URL. Args: url: The URL to fetch text from """ # Sanitize URL input url = sanitize_url(url) if not url: return "Error: Invalid URL format" logger.info(f"Fetching URL text: {url}") content = fetch_url_content(url) return f"Text content from {url}:\n\n{content}"
  • Core helper function implementing the URL fetching logic: safety checks (SSRF protection), HTTP request with size limits, BeautifulSoup parsing to extract visible text content.
    def fetch_url_content(url: str) -> str: """Helper function to fetch text content from a URL with safety checks.""" # Validate URL safety first if not is_safe_url(url): logger.warning(f"SECURITY: Blocked unsafe URL: {url}") return "Error: URL not allowed for security reasons" try: # Log request for monitoring logger.info(f"REQUEST: Fetching content from {url}") # Make request with streaming to check size resp = requests.get(url, headers=HEADERS, timeout=REQUEST_TIMEOUT, stream=True) resp.raise_for_status() # Log response details logger.info(f"RESPONSE: {resp.status_code} from {url}, Content-Type: {resp.headers.get('Content-Type', 'unknown')}") # Check content length header content_length = resp.headers.get('Content-Length') if content_length and int(content_length) > MAX_RESPONSE_SIZE: logger.warning(f"SECURITY: Content too large: {content_length} bytes for {url}") return f"Error: Content too large ({content_length} bytes, max {MAX_RESPONSE_SIZE})" # Read content with size limit content_chunks = [] total_size = 0 try: for chunk in resp.iter_content(chunk_size=8192, decode_unicode=True): if chunk: # filter out keep-alive new chunks total_size += len(chunk) if total_size > MAX_RESPONSE_SIZE: logger.warning(f"SECURITY: Content exceeded size limit for {url}") return f"Error: Content exceeded size limit ({MAX_RESPONSE_SIZE} bytes)" content_chunks.append(chunk) except UnicodeDecodeError: # If we can't decode as text, it's probably binary content logger.warning(f"CONTENT: Unable to decode content as text from {url}") return "Error: Unable to decode content as text" html_content = ''.join(content_chunks) # Parse with BeautifulSoup soup = BeautifulSoup(html_content, "html.parser") # Remove script and style elements for script in soup(["script", "style"]): script.decompose() text_content = soup.get_text(separator="\n", strip=True) # Limit final content length if len(text_content) > CONTENT_LENGTH_LIMIT: logger.info(f"CONTENT: Truncating content from {url} ({len(text_content)} -> {CONTENT_LENGTH_LIMIT} chars)") text_content = text_content[:CONTENT_LENGTH_LIMIT] + "... [Content truncated]" logger.info(f"SUCCESS: Fetched {len(text_content)} characters from {url}") return text_content except requests.RequestException as e: logger.error(f"REQUEST_ERROR: Failed to fetch {url}: {e}") return "Error: Unable to fetch URL content" except Exception as e: logger.error(f"UNEXPECTED_ERROR: Processing {url}: {e}", exc_info=True) return "Error: An unexpected error occurred while processing the URL"
  • Alternative synchronous handler implementation in server_fastmcp.py using Pydantic Field for input schema validation.
    @mcp.tool() def fetch_url_text(url: str = Field(description="The URL to fetch text from")) -> str: """Download all visible text from a URL""" # Sanitize URL input url = sanitize_url(url) if not url: return "Error: Invalid URL format" logger.info(f"Fetching URL text: {url}") content = fetch_url_content(url) return f"Text content from {url}:\n\n{content}"
  • Pydantic schema definition for the tool input parameter using Field with description.
    def fetch_url_text(url: str = Field(description="The URL to fetch text from")) -> str:
  • Tool listing in get_server_info handler, indicating registration of fetch_url_text among available tools.
    @mcp.tool() async def get_server_info() -> str: """Get information about this MCP server including version, implementation, and capabilities. Returns: Server information including version, implementation type, and available features """ info = [ f"URL Text Fetcher MCP Server", f"Version: {__version__}", f"Implementation: {__implementation__}", f"Brave Search Rate Limit: {BRAVE_RATE_LIMIT_RPS} requests/second", f"Request Timeout: {REQUEST_TIMEOUT} seconds", f"Content Limit: {CONTENT_LENGTH_LIMIT:,} characters", f"Max Response Size: {MAX_RESPONSE_SIZE:,} bytes", "", "Available Tools:", "• fetch_url_text - Download visible text from any URL", "• fetch_page_links - Extract all links from a webpage", "• brave_search_and_fetch - Search web and fetch content from top results", "• test_brave_search - Test Brave Search API connectivity", "• get_server_info - Display this server information", "", "Security Features:", "• SSRF protection against internal network access", "• Input sanitization for URLs and search queries", "• Content size limiting and memory protection", "• Thread-safe rate limiting for API requests", "", f"Brave API Key: {'✓ Configured' if BRAVE_API_KEY else '✗ Missing'}" ] return "\n".join(info)

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/billallison/brsearch-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server