Skip to main content
Glama

fetch-text

Extract text content from URLs using HTTP methods, converting HTML to Markdown format for readable output.

Instructions

Fetch text content from a URL using various HTTP methods. Defaults to converting HTML to Markdown format.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to get text content from
methodNoHTTP method to use (GET, POST, PUT, DELETE, PATCH, etc.). Default is GET.GET
dataNoRequest body data for POST/PUT/PATCH requests. Can be a JSON object or string.
headersNoAdditional HTTP headers to include in the request
output_formatNoOutput format: 'markdown' (default), 'clean_text', or 'raw_html'.markdown

Implementation Reference

  • Handler logic for the 'fetch-text' tool within the @server.call_tool() dispatcher. Validates the URL argument and invokes fetch_url_content with as_json=False for text processing.
    elif tool_name == "fetch-text": url = args.get("url") if not url or not isinstance(url, str): result = "Failed to call tool, error: Missing required property: url" else: method = args.get("method", "GET") data = args.get("data") headers = args.get("headers") output_format = args.get("output_format", "markdown") result = await fetch_url_content(url, as_json=False, method=method, data=data, headers=headers, output_format=output_format)
  • Tool registration in @server.list_tools(), defining the name, description, and input schema for 'fetch-text'.
    types.Tool( name="fetch-text", description="Fetch text content from a URL using various HTTP methods. Defaults to converting HTML to Markdown format.", inputSchema={ "type": "object", "properties": { "url": { "type": "string", "description": "The URL to get text content from", }, "method": { "type": "string", "description": "HTTP method to use (GET, POST, PUT, DELETE, PATCH, etc.). Default is GET.", "default": "GET" }, "data": { "type": ["object", "string", "null"], "description": "Request body data for POST/PUT/PATCH requests. Can be a JSON object or string.", }, "headers": { "type": "object", "description": "Additional HTTP headers to include in the request", "additionalProperties": {"type": "string"} }, "output_format": { "type": "string", "description": "Output format: 'markdown' (default), 'clean_text', or 'raw_html'.", "enum": ["markdown", "clean_text", "raw_html"], "default": "markdown" } }, "required": ["url"], }, ),
  • Input schema definition for the 'fetch-text' tool, specifying parameters like url (required), method, data, headers, and output_format.
    inputSchema={ "type": "object", "properties": { "url": { "type": "string", "description": "The URL to get text content from", }, "method": { "type": "string", "description": "HTTP method to use (GET, POST, PUT, DELETE, PATCH, etc.). Default is GET.", "default": "GET" }, "data": { "type": ["object", "string", "null"], "description": "Request body data for POST/PUT/PATCH requests. Can be a JSON object or string.", }, "headers": { "type": "object", "description": "Additional HTTP headers to include in the request", "additionalProperties": {"type": "string"} }, "output_format": { "type": "string", "description": "Output format: 'markdown' (default), 'clean_text', or 'raw_html'.", "enum": ["markdown", "clean_text", "raw_html"], "default": "markdown" } }, "required": ["url"], }, ),
  • Primary helper function implementing URL fetching logic for 'fetch-text' tool (called with as_json=False). Performs HTTP requests using httpx, validates responses, handles various methods, and processes text content via extract_text_content.
    async def fetch_url_content( url: str, as_json: bool = True, method: str = "GET", data: dict | str | None = None, headers: dict[str, str] | None = None, output_format: str = "markdown" ) -> str: """ Fetch content from a URL using different HTTP methods. Args: url: URL to fetch content from as_json: If True, validates content as JSON; if False, returns text content method: HTTP method (GET, POST, PUT, DELETE, etc.) data: Request body data (for POST/PUT requests) headers: Additional headers to include in the request output_format: If as_json=False, output format - "markdown", "clean_text", or "raw_html" Returns: String content from the URL (JSON, Markdown, clean text, or raw HTML) Raises: httpx.RequestError: For network-related errors json.JSONDecodeError: If as_json=True and content is not valid JSON ValueError: If URL is invalid or unsafe """ # Validate URL first validate_url(url) config = await get_http_client_config() max_size = config.pop("max_size", 10 * 1024 * 1024) # Remove from client config # Merge additional headers with config headers (user headers override defaults) if headers: if config.get("headers"): config["headers"].update(headers) else: config["headers"] = headers async with httpx.AsyncClient(**config) as client: # Handle different HTTP methods method = method.upper() if method == "GET": response = await client.get(url) elif method == "POST": if isinstance(data, dict): response = await client.post(url, json=data) else: response = await client.post(url, content=data) elif method == "PUT": if isinstance(data, dict): response = await client.put(url, json=data) else: response = await client.put(url, content=data) elif method == "DELETE": response = await client.delete(url) elif method == "PATCH": if isinstance(data, dict): response = await client.patch(url, json=data) else: response = await client.patch(url, content=data) elif method == "HEAD": response = await client.head(url) elif method == "OPTIONS": response = await client.options(url) else: # For any other method, use the generic request method if isinstance(data, dict): response = await client.request(method, url, json=data) else: response = await client.request(method, url, content=data) response.raise_for_status() # Check response size content_length = len(response.content) if content_length > max_size: raise ValueError(f"Response size ({content_length} bytes) exceeds maximum allowed ({max_size} bytes)") if as_json: # For JSON responses, use response.text directly (no compression expected) content_to_parse = response.text if not content_to_parse: # If response.text is empty, try decoding content directly try: content_to_parse = response.content.decode('utf-8') except UnicodeDecodeError: content_to_parse = "" if content_to_parse: try: json.loads(content_to_parse) return content_to_parse except json.JSONDecodeError: # If text parsing fails, try content decoding as fallback if content_to_parse == response.text: try: fallback_content = response.content.decode('utf-8') json.loads(fallback_content) return fallback_content except (json.JSONDecodeError, UnicodeDecodeError): pass raise json.JSONDecodeError("Response is not valid JSON", content_to_parse, 0) else: # Empty response return "" else: # For text content, apply format conversion return extract_text_content(response.text, output_format)
  • Supporting utility for formatting fetched HTML content into markdown, clean text, or raw HTML. Invoked by fetch_url_content for 'fetch-text' tool.
    def extract_text_content(html_content: str, output_format: str = "markdown") -> str: """ Extract text content from HTML in different formats. Args: html_content: Raw HTML content output_format: Output format - "markdown" (default), "clean_text", or "raw_html" Returns: Extracted content in the specified format """ if output_format == "raw_html": return html_content try: from markdownify import markdownify as md if output_format == "markdown": # Convert HTML to Markdown markdown_text = md(html_content, heading_style="ATX", # Use # for headings bullets="*", # Use * for bullets strip=["script", "style", "noscript"]) # Clean up extra whitespace lines = (line.rstrip() for line in markdown_text.splitlines()) markdown_text = '\n'.join(line for line in lines if line.strip() or not line) return markdown_text.strip() elif output_format == "clean_text": # Parse HTML with BeautifulSoup soup = BeautifulSoup(html_content, 'html.parser') # Remove script and style elements for script in soup(["script", "style", "noscript"]): script.decompose() # Get text content text = soup.get_text() # Break into lines and remove leading and trailing space on each lines = (line.strip() for line in text.splitlines()) # Break multi-headlines into a line each chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) # Drop blank lines text = ' '.join(chunk for chunk in chunks if chunk) return text else: # Unknown format, return raw HTML return html_content except Exception: # If processing fails, return original content return html_content

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ackness/fetch-jsonpath-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server