Skip to main content
Glama
mistakeknot

interdeep

by mistakeknot

extract_content

Extract clean text and markdown content from web pages using hybrid extraction strategies, with optional JavaScript rendering support for dynamic sites.

Instructions

Extract clean text/markdown content from a URL using trafilatura (fast) with optional Playwright fallback (JS-rendered pages).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesThe URL to extract content from.
timeoutNoFetch timeout in seconds (default 10).

Implementation Reference

  • The _handle_extract_content function executes the extraction logic using extract_hybrid_async.
    async def _handle_extract_content(arguments: dict) -> list[TextContent]:
        url = arguments.get("url", "")
        if not url:
            return _err("url is required")
        timeout = arguments.get("timeout", 10)
        try:
            result = await extract_hybrid_async(url=url, timeout=timeout)
            return _ok(_result_to_dict(result))
        except Exception as e:
            logger.exception("extract_content failed for %s", url)
            return _err(f"Extraction failed: {e}")
  • The Tool definition for 'extract_content' is registered in the list_tools handler.
    Tool(
        name="extract_content",
        description="Extract clean text/markdown content from a URL using trafilatura (fast) with optional Playwright fallback (JS-rendered pages).",
        inputSchema={
            "type": "object",
            "properties": {
                "url": {
                    "type": "string",
  • The 'extract_content' tool is mapped to its handler function in the _HANDLERS dictionary.
    "extract_content": _handle_extract_content,

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/mistakeknot/interdeep'

If you have feedback or need assistance with the MCP directory API, please join our Discord server