Skip to main content
Glama
ackness

Fetch JSONPath MCP

by ackness

fetch-text

Extract text content from URLs using HTTP methods, converting HTML to Markdown format for readable output.

Instructions

Fetch text content from a URL using various HTTP methods. Defaults to converting HTML to Markdown format.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to get text content from
methodNoHTTP method to use (GET, POST, PUT, DELETE, PATCH, etc.). Default is GET.GET
dataNoRequest body data for POST/PUT/PATCH requests. Can be a JSON object or string.
headersNoAdditional HTTP headers to include in the request
output_formatNoOutput format: 'markdown' (default), 'clean_text', or 'raw_html'.markdown

Implementation Reference

  • Handler logic for the 'fetch-text' tool within the @server.call_tool() dispatcher. Validates the URL argument and invokes fetch_url_content with as_json=False for text processing.
    elif tool_name == "fetch-text":
        url = args.get("url")
        if not url or not isinstance(url, str):
            result = "Failed to call tool, error: Missing required property: url"
        else:
            method = args.get("method", "GET")
            data = args.get("data")
            headers = args.get("headers")
            output_format = args.get("output_format", "markdown")
            result = await fetch_url_content(url, as_json=False, method=method, data=data, headers=headers, output_format=output_format)
  • Tool registration in @server.list_tools(), defining the name, description, and input schema for 'fetch-text'.
    types.Tool(
        name="fetch-text",
        description="Fetch text content from a URL using various HTTP methods. Defaults to converting HTML to Markdown format.",
        inputSchema={
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "The URL to get text content from",
                },
                "method": {
                    "type": "string",
                    "description": "HTTP method to use (GET, POST, PUT, DELETE, PATCH, etc.). Default is GET.",
                    "default": "GET"
                },
                "data": {
                    "type": ["object", "string", "null"],
                    "description": "Request body data for POST/PUT/PATCH requests. Can be a JSON object or string.",
                },
                "headers": {
                    "type": "object",
                    "description": "Additional HTTP headers to include in the request",
                    "additionalProperties": {"type": "string"}
                },
                "output_format": {
                    "type": "string",
                    "description": "Output format: 'markdown' (default), 'clean_text', or 'raw_html'.",
                    "enum": ["markdown", "clean_text", "raw_html"],
                    "default": "markdown"
                }
            },
            "required": ["url"],
        },
    ),
  • Input schema definition for the 'fetch-text' tool, specifying parameters like url (required), method, data, headers, and output_format.
        inputSchema={
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
                    "description": "The URL to get text content from",
                },
                "method": {
                    "type": "string",
                    "description": "HTTP method to use (GET, POST, PUT, DELETE, PATCH, etc.). Default is GET.",
                    "default": "GET"
                },
                "data": {
                    "type": ["object", "string", "null"],
                    "description": "Request body data for POST/PUT/PATCH requests. Can be a JSON object or string.",
                },
                "headers": {
                    "type": "object",
                    "description": "Additional HTTP headers to include in the request",
                    "additionalProperties": {"type": "string"}
                },
                "output_format": {
                    "type": "string",
                    "description": "Output format: 'markdown' (default), 'clean_text', or 'raw_html'.",
                    "enum": ["markdown", "clean_text", "raw_html"],
                    "default": "markdown"
                }
            },
            "required": ["url"],
        },
    ),
  • Primary helper function implementing URL fetching logic for 'fetch-text' tool (called with as_json=False). Performs HTTP requests using httpx, validates responses, handles various methods, and processes text content via extract_text_content.
    async def fetch_url_content(
        url: str, 
        as_json: bool = True, 
        method: str = "GET", 
        data: dict | str | None = None,
        headers: dict[str, str] | None = None,
        output_format: str = "markdown"
    ) -> str:
        """
        Fetch content from a URL using different HTTP methods.
        
        Args:
            url: URL to fetch content from
            as_json: If True, validates content as JSON; if False, returns text content
            method: HTTP method (GET, POST, PUT, DELETE, etc.)
            data: Request body data (for POST/PUT requests)
            headers: Additional headers to include in the request
            output_format: If as_json=False, output format - "markdown", "clean_text", or "raw_html"
            
        Returns:
            String content from the URL (JSON, Markdown, clean text, or raw HTML)
            
        Raises:
            httpx.RequestError: For network-related errors
            json.JSONDecodeError: If as_json=True and content is not valid JSON
            ValueError: If URL is invalid or unsafe
        """
        # Validate URL first
        validate_url(url)
        
        config = await get_http_client_config()
        max_size = config.pop("max_size", 10 * 1024 * 1024)  # Remove from client config
        
        # Merge additional headers with config headers (user headers override defaults)
        if headers:
            if config.get("headers"):
                config["headers"].update(headers)
            else:
                config["headers"] = headers
        
        async with httpx.AsyncClient(**config) as client:
            # Handle different HTTP methods
            method = method.upper()
            
            if method == "GET":
                response = await client.get(url)
            elif method == "POST":
                if isinstance(data, dict):
                    response = await client.post(url, json=data)
                else:
                    response = await client.post(url, content=data)
            elif method == "PUT":
                if isinstance(data, dict):
                    response = await client.put(url, json=data)
                else:
                    response = await client.put(url, content=data)
            elif method == "DELETE":
                response = await client.delete(url)
            elif method == "PATCH":
                if isinstance(data, dict):
                    response = await client.patch(url, json=data)
                else:
                    response = await client.patch(url, content=data)
            elif method == "HEAD":
                response = await client.head(url)
            elif method == "OPTIONS":
                response = await client.options(url)
            else:
                # For any other method, use the generic request method
                if isinstance(data, dict):
                    response = await client.request(method, url, json=data)
                else:
                    response = await client.request(method, url, content=data)
            
            response.raise_for_status()
            
            # Check response size
            content_length = len(response.content)
            if content_length > max_size:
                raise ValueError(f"Response size ({content_length} bytes) exceeds maximum allowed ({max_size} bytes)")
            
            if as_json:
                # For JSON responses, use response.text directly (no compression expected)
                content_to_parse = response.text
                if not content_to_parse:
                    # If response.text is empty, try decoding content directly
                    try:
                        content_to_parse = response.content.decode('utf-8')
                    except UnicodeDecodeError:
                        content_to_parse = ""
                
                if content_to_parse:
                    try:
                        json.loads(content_to_parse)
                        return content_to_parse
                    except json.JSONDecodeError:
                        # If text parsing fails, try content decoding as fallback
                        if content_to_parse == response.text:
                            try:
                                fallback_content = response.content.decode('utf-8')
                                json.loads(fallback_content)
                                return fallback_content
                            except (json.JSONDecodeError, UnicodeDecodeError):
                                pass
                        raise json.JSONDecodeError("Response is not valid JSON", content_to_parse, 0)
                else:
                    # Empty response
                    return ""
            else:
                # For text content, apply format conversion
                return extract_text_content(response.text, output_format)
  • Supporting utility for formatting fetched HTML content into markdown, clean text, or raw HTML. Invoked by fetch_url_content for 'fetch-text' tool.
    def extract_text_content(html_content: str, output_format: str = "markdown") -> str:
        """
        Extract text content from HTML in different formats.
        
        Args:
            html_content: Raw HTML content
            output_format: Output format - "markdown" (default), "clean_text", or "raw_html"
        
        Returns:
            Extracted content in the specified format
        """
        if output_format == "raw_html":
            return html_content
        
        try:
            from markdownify import markdownify as md
            
            if output_format == "markdown":
                # Convert HTML to Markdown
                markdown_text = md(html_content, 
                                   heading_style="ATX",  # Use # for headings
                                   bullets="*",          # Use * for bullets
                                   strip=["script", "style", "noscript"])
                
                # Clean up extra whitespace
                lines = (line.rstrip() for line in markdown_text.splitlines())
                markdown_text = '\n'.join(line for line in lines if line.strip() or not line)
                
                return markdown_text.strip()
                
            elif output_format == "clean_text":
                # Parse HTML with BeautifulSoup
                soup = BeautifulSoup(html_content, 'html.parser')
                
                # Remove script and style elements
                for script in soup(["script", "style", "noscript"]):
                    script.decompose()
                
                # Get text content
                text = soup.get_text()
                
                # Break into lines and remove leading and trailing space on each
                lines = (line.strip() for line in text.splitlines())
                
                # Break multi-headlines into a line each
                chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
                
                # Drop blank lines
                text = ' '.join(chunk for chunk in chunks if chunk)
                
                return text
                
            else:
                # Unknown format, return raw HTML
                return html_content
                
        except Exception:
            # If processing fails, return original content
            return html_content
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions 'various HTTP methods' and 'defaults to converting HTML to Markdown format,' which adds some context about functionality. However, it doesn't cover critical aspects like error handling, rate limits, authentication needs, or what happens with non-HTML content, leaving significant gaps for a tool that interacts with external URLs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose. It avoids unnecessary details, though it could be slightly more structured by explicitly separating key points. Overall, it's concise with minimal waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, no annotations, no output schema), the description is incomplete. It lacks details on behavioral traits, error handling, and output specifics, which are crucial for a tool fetching content from URLs. The schema covers parameters well, but the description doesn't compensate for missing annotations or output schema, leaving the agent with insufficient context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds minimal value beyond the schema, mentioning 'various HTTP methods' and 'defaults to converting HTML to Markdown format,' which loosely relates to 'method' and 'output_format' parameters but doesn't provide additional semantics. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Fetch text content from a URL using various HTTP methods.' It specifies the resource (URL) and action (fetch text content), but doesn't explicitly differentiate from sibling tools like 'fetch-json' or 'batch-fetch-text' beyond mentioning 'text content' and 'Markdown format.' This makes it clear but not fully sibling-distinctive.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'fetch-json' or 'batch-fetch-text.' It mentions 'various HTTP methods' and 'defaults to converting HTML to Markdown format,' which implies some context, but lacks explicit when-to-use or when-not-to-use statements, leaving the agent to infer usage scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ackness/fetch-jsonpath-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server