Skip to main content
Glama

Perplexity MCP Server

extract_url_content

Extracts clean, main text content from URLs using browser automation and readability tools. Handles dynamic rendering, GitHub repos, and pre-checks for non-HTML content. Ideal for articles, blogs, and structured data extraction.

Instructions

Uses browser automation (Puppeteer) and Mozilla's Readability library to extract the main article text content from a given URL. Handles dynamic JavaScript rendering and includes fallback logic. For GitHub repository URLs, it attempts to fetch structured content via gitingest.com. Performs a pre-check for non-HTML content types and checks HTTP status after navigation. Ideal for getting clean text from articles/blog posts. Note: May struggle to isolate only core content on complex homepages or dashboards, potentially including UI elements.

Input Schema

NameRequiredDescriptionDefault
depthNoOptional: Maximum depth for recursive link exploration (1-5). Default is 1 (no recursion).
urlYesThe URL of the website to extract content from.

Input Schema (JSON Schema)

{ "properties": { "depth": { "default": 1, "description": "Optional: Maximum depth for recursive link exploration (1-5). Default is 1 (no recursion).", "examples": [ 1, 3 ], "maximum": 5, "minimum": 1, "type": "number" }, "url": { "description": "The URL of the website to extract content from.", "examples": [ "https://www.example.com/article" ], "type": "string" } }, "required": [ "url" ], "type": "object" }
Install Server

Other Tools from Perplexity MCP Server

Related Tools

    MCP directory API

    We provide all the information about MCP servers via our MCP API.

    curl -X GET 'https://glama.ai/api/mcp/v1/servers/wysh3/perplexity-mcp-zerver'

    If you have feedback or need assistance with the MCP directory API, please join our Discord server