Skip to main content
Glama
PedroDnT

MCP Deep Web Research Server

visit_page

Extract content from webpages to enable advanced web research with intelligent search queuing and enhanced content analysis capabilities.

Instructions

Visit a webpage and extract its content

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL to visit

Implementation Reference

  • Main handler logic for the 'visit_page' tool. Validates input URL, launches browser if needed, navigates safely, extracts page title and markdown content, returns structured JSON.
    case 'visit_page': {
        const args = request.params.arguments as unknown as VisitPageArgs;
        if (!args?.url) {
            throw new McpError(ErrorCode.InvalidParams, 'URL is required');
        }
    
        if (!isValidUrl(args.url)) {
            throw new McpError(
                ErrorCode.InvalidParams,
                `Invalid URL: ${args.url}. Only http and https protocols are supported.`
            );
        }
    
        const page = await ensureBrowser();
        try {
            await safePageNavigation(page, args.url);
            const title = await page.title();
            const content = await extractContentAsMarkdown(page);
    
            return {
                content: [
                    {
                        type: 'text',
                        text: JSON.stringify({
                            url: args.url,
                            title,
                            content
                        }, null, 2)
                    }
                ]
            };
        } catch (error) {
            throw new McpError(
                ErrorCode.InternalError,
                `Failed to visit page: ${(error as Error).message}`
            );
        }
    }
  • src/index.ts:148-161 (registration)
    Registration of the 'visit_page' tool in the listTools handler, including name, description, and input schema.
    {
        name: 'visit_page',
        description: 'Visit a webpage and extract its content',
        inputSchema: {
            type: 'object',
            properties: {
                url: {
                    type: 'string',
                    description: 'URL to visit'
                }
            },
            required: ['url']
        }
    }
  • TypeScript interface defining the input arguments for the visit_page tool.
    interface VisitPageArgs {
        url: string;
    }
  • Helper function to extract main content from the page and convert it to clean Markdown using TurndownService.
    async function extractContentAsMarkdown(page: Page): Promise<string> {
        const html = await page.evaluate(() => {
            // Try standard content containers first
            const contentSelectors = [
                'main',
                'article',
                '[role="main"]',
                '#content',
                '.content',
                '.main',
                '.post',
                '.article'
            ];
    
            for (const selector of contentSelectors) {
                const element = document.querySelector(selector);
                if (element) {
                    return element.outerHTML;
                }
            }
    
            // Fallback to cleaning full body content
            const body = document.body;
            const elementsToRemove = [
                'header', 'footer', 'nav',
                '[role="navigation"]', 'aside',
                '.sidebar', '[role="complementary"]',
                '.nav', '.menu', '.header',
                '.footer', '.advertisement',
                '.ads', '.cookie-notice'
            ];
    
            elementsToRemove.forEach(sel => {
                body.querySelectorAll(sel).forEach(el => el.remove());
            });
    
            return body.outerHTML;
        });
    
        if (!html) {
            return '';
        }
    
        try {
            const markdown = turndownService.turndown(html);
            return markdown
                .replace(/\n{3,}/g, '\n\n')
                .replace(/^- $/gm, '')
                .replace(/^\s+$/gm, '')
                .trim();
        } catch (error) {
            console.error('Error converting HTML to Markdown:', error);
            return html;
        }
    }
  • Helper for safe navigation: goes to URL, detects and throws on bot protection or suspicious challenges.
    async function safePageNavigation(page: Page, url: string): Promise<void> {
        await page.goto(url, {
            waitUntil: 'domcontentloaded',
            timeout: 10000 // 10 second timeout
        });
    
        // Quick check for bot protection or security challenges
        const validation = await page.evaluate(() => {
            const botProtectionExists = [
                '#challenge-running',
                '#cf-challenge-running',
                '#px-captcha',
                '#ddos-protection',
                '#waf-challenge-html'
            ].some(selector => document.querySelector(selector));
    
            const suspiciousTitle = [
                'security check',
                'ddos protection',
                'please wait',
                'just a moment',
                'attention required'
            ].some(phrase => document.title.toLowerCase().includes(phrase));
    
            return {
                botProtection: botProtectionExists,
                suspiciousTitle,
                title: document.title
            };
        });
    
        if (validation.botProtection) {
            throw new Error('Bot protection detected');
        }
    
        if (validation.suspiciousTitle) {
            throw new Error(`Suspicious page title detected: "${validation.title}"`);
        }
    }
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PedroDnT/mcp-DEEPwebresearch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server