Skip to main content
Glama
ackness

Fetch JSONPath MCP

by ackness

fetch-text

Extract text content from URLs using HTTP methods, converting HTML to Markdown format for readable output.

Instructions

Fetch text content from a URL using various HTTP methods. Defaults to converting HTML to Markdown format.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to get text content from
methodNoHTTP method to use (GET, POST, PUT, DELETE, PATCH, etc.). Default is GET.GET
dataNoRequest body data for POST/PUT/PATCH requests. Can be a JSON object or string.
headersNoAdditional HTTP headers to include in the request
output_formatNoOutput format: 'markdown' (default), 'clean_text', or 'raw_html'.markdown

Implementation Reference

  • Handler logic for the 'fetch-text' tool within the @server.call_tool() dispatcher. Validates the URL argument and invokes fetch_url_content with as_json=False for text processing.
    elif tool_name == "fetch-text":
        url = args.get("url")
        if not url or not isinstance(url, str):
            result = "Failed to call tool, error: Missing required property: url"
        else:
            method = args.get("method", "GET")
            data = args.get("data")
            headers = args.get("headers")
            output_format = args.get("output_format", "markdown")
            result = await fetch_url_content(url, as_json=False, method=method, data=data, headers=headers, output_format=output_format)
  • Tool registration in @server.list_tools(), defining the name, description, and input schema for 'fetch-text'.
    types.Tool(
        name="fetch-text",
        description="Fetch text content from a URL using various HTTP methods. Defaults to converting HTML to Markdown format.",
        inputSchema={
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "The URL to get text content from",
                },
                "method": {
                    "type": "string",
                    "description": "HTTP method to use (GET, POST, PUT, DELETE, PATCH, etc.). Default is GET.",
                    "default": "GET"
                },
                "data": {
                    "type": ["object", "string", "null"],
                    "description": "Request body data for POST/PUT/PATCH requests. Can be a JSON object or string.",
                },
                "headers": {
                    "type": "object",
                    "description": "Additional HTTP headers to include in the request",
                    "additionalProperties": {"type": "string"}
                },
                "output_format": {
                    "type": "string",
                    "description": "Output format: 'markdown' (default), 'clean_text', or 'raw_html'.",
                    "enum": ["markdown", "clean_text", "raw_html"],
                    "default": "markdown"
                }
            },
            "required": ["url"],
        },
    ),
  • Input schema definition for the 'fetch-text' tool, specifying parameters like url (required), method, data, headers, and output_format.
        inputSchema={
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "The URL to get text content from",
                },
                "method": {
                    "type": "string",
                    "description": "HTTP method to use (GET, POST, PUT, DELETE, PATCH, etc.). Default is GET.",
                    "default": "GET"
                },
                "data": {
                    "type": ["object", "string", "null"],
                    "description": "Request body data for POST/PUT/PATCH requests. Can be a JSON object or string.",
                },
                "headers": {
                    "type": "object",
                    "description": "Additional HTTP headers to include in the request",
                    "additionalProperties": {"type": "string"}
                },
                "output_format": {
                    "type": "string",
                    "description": "Output format: 'markdown' (default), 'clean_text', or 'raw_html'.",
                    "enum": ["markdown", "clean_text", "raw_html"],
                    "default": "markdown"
                }
            },
            "required": ["url"],
        },
    ),
  • Primary helper function implementing URL fetching logic for 'fetch-text' tool (called with as_json=False). Performs HTTP requests using httpx, validates responses, handles various methods, and processes text content via extract_text_content.
    async def fetch_url_content(
        url: str, 
        as_json: bool = True, 
        method: str = "GET", 
        data: dict | str | None = None,
        headers: dict[str, str] | None = None,
        output_format: str = "markdown"
    ) -> str:
        """
        Fetch content from a URL using different HTTP methods.
        
        Args:
            url: URL to fetch content from
            as_json: If True, validates content as JSON; if False, returns text content
            method: HTTP method (GET, POST, PUT, DELETE, etc.)
            data: Request body data (for POST/PUT requests)
            headers: Additional headers to include in the request
            output_format: If as_json=False, output format - "markdown", "clean_text", or "raw_html"
            
        Returns:
            String content from the URL (JSON, Markdown, clean text, or raw HTML)
            
        Raises:
            httpx.RequestError: For network-related errors
            json.JSONDecodeError: If as_json=True and content is not valid JSON
            ValueError: If URL is invalid or unsafe
        """
        # Validate URL first
        validate_url(url)
        
        config = await get_http_client_config()
        max_size = config.pop("max_size", 10 * 1024 * 1024)  # Remove from client config
        
        # Merge additional headers with config headers (user headers override defaults)
        if headers:
            if config.get("headers"):
                config["headers"].update(headers)
            else:
                config["headers"] = headers
        
        async with httpx.AsyncClient(**config) as client:
            # Handle different HTTP methods
            method = method.upper()
            
            if method == "GET":
                response = await client.get(url)
            elif method == "POST":
                if isinstance(data, dict):
                    response = await client.post(url, json=data)
                else:
                    response = await client.post(url, content=data)
            elif method == "PUT":
                if isinstance(data, dict):
                    response = await client.put(url, json=data)
                else:
                    response = await client.put(url, content=data)
            elif method == "DELETE":
                response = await client.delete(url)
            elif method == "PATCH":
                if isinstance(data, dict):
                    response = await client.patch(url, json=data)
                else:
                    response = await client.patch(url, content=data)
            elif method == "HEAD":
                response = await client.head(url)
            elif method == "OPTIONS":
                response = await client.options(url)
            else:
                # For any other method, use the generic request method
                if isinstance(data, dict):
                    response = await client.request(method, url, json=data)
                else:
                    response = await client.request(method, url, content=data)
            
            response.raise_for_status()
            
            # Check response size
            content_length = len(response.content)
            if content_length > max_size:
                raise ValueError(f"Response size ({content_length} bytes) exceeds maximum allowed ({max_size} bytes)")
            
            if as_json:
                # For JSON responses, use response.text directly (no compression expected)
                content_to_parse = response.text
                if not content_to_parse:
                    # If response.text is empty, try decoding content directly
                    try:
                        content_to_parse = response.content.decode('utf-8')
                    except UnicodeDecodeError:
                        content_to_parse = ""
                
                if content_to_parse:
                    try:
                        json.loads(content_to_parse)
                        return content_to_parse
                    except json.JSONDecodeError:
                        # If text parsing fails, try content decoding as fallback
                        if content_to_parse == response.text:
                            try:
                                fallback_content = response.content.decode('utf-8')
                                json.loads(fallback_content)
                                return fallback_content
                            except (json.JSONDecodeError, UnicodeDecodeError):
                                pass
                        raise json.JSONDecodeError("Response is not valid JSON", content_to_parse, 0)
                else:
                    # Empty response
                    return ""
            else:
                # For text content, apply format conversion
                return extract_text_content(response.text, output_format)
  • Supporting utility for formatting fetched HTML content into markdown, clean text, or raw HTML. Invoked by fetch_url_content for 'fetch-text' tool.
    def extract_text_content(html_content: str, output_format: str = "markdown") -> str:
        """
        Extract text content from HTML in different formats.
        
        Args:
            html_content: Raw HTML content
            output_format: Output format - "markdown" (default), "clean_text", or "raw_html"
        
        Returns:
            Extracted content in the specified format
        """
        if output_format == "raw_html":
            return html_content
        
        try:
            from markdownify import markdownify as md
            
            if output_format == "markdown":
                # Convert HTML to Markdown
                markdown_text = md(html_content, 
                                   heading_style="ATX",  # Use # for headings
                                   bullets="*",          # Use * for bullets
                                   strip=["script", "style", "noscript"])
                
                # Clean up extra whitespace
                lines = (line.rstrip() for line in markdown_text.splitlines())
                markdown_text = '\n'.join(line for line in lines if line.strip() or not line)
                
                return markdown_text.strip()
                
            elif output_format == "clean_text":
                # Parse HTML with BeautifulSoup
                soup = BeautifulSoup(html_content, 'html.parser')
                
                # Remove script and style elements
                for script in soup(["script", "style", "noscript"]):
                    script.decompose()
                
                # Get text content
                text = soup.get_text()
                
                # Break into lines and remove leading and trailing space on each
                lines = (line.strip() for line in text.splitlines())
                
                # Break multi-headlines into a line each
                chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
                
                # Drop blank lines
                text = ' '.join(chunk for chunk in chunks if chunk)
                
                return text
                
            else:
                # Unknown format, return raw HTML
                return html_content
                
        except Exception:
            # If processing fails, return original content
            return html_content

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ackness/fetch-jsonpath-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server