Skip to main content
Glama
PedroDnT

MCP Deep Web Research Server

deep_research

Extract and analyze web content to perform in-depth research on topics with configurable exploration depth and relevance filtering.

Instructions

Perform deep research on a topic with content extraction and analysis

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
topicYesResearch topic or question
maxDepthNoMaximum depth of related content exploration
maxBranchingNoMaximum number of related paths to explore
timeoutNoResearch timeout in milliseconds
minRelevanceScoreNoMinimum relevance score for including content

Implementation Reference

  • The primary handler function for the deep_research tool. It orchestrates the entire research process: generates queries, performs parallel searches, deduplicates results, processes URLs through ResearchSession, formats findings, and includes performance timing.
    public async startResearch(topic: string, options: DeepResearchOptions = {}): Promise<ResearchResult> {
        const startTime = Date.now();
        const timings: { [key: string]: number } = {};
    
        console.log('[Performance] Starting research for topic:', topic);
        console.log('[Performance] Options:', options);
    
        // Create new research session
        const session = new ResearchSession(topic, {
            maxDepth: options.maxDepth,
            maxBranching: options.maxBranching,
            timeout: options.timeout,
            minRelevanceScore: options.minRelevanceScore,
            maxParallelOperations: options.maxParallelOperations
        });
    
        console.log('[Performance] Created research session:', session.id);
        this.activeSessions.set(session.id, session);
    
        try {
            console.log('[Performance] Starting parallel search...');
            const parallelSearchStart = Date.now();
            
            const queries = [
                topic,
                `${topic} tutorial`,
                `${topic} guide`,
                `${topic} example`,
                `${topic} implementation`,
                `${topic} code`,
                `${topic} design pattern`,
                `${topic} best practice`
            ];
            console.log('[Performance] Search queries:', queries);
    
            const searchResults = await this.parallelSearch.parallelSearch(queries);
            timings.parallelSearch = Date.now() - parallelSearchStart;
            console.log('[Performance] Parallel search complete. Duration:', timings.parallelSearch, 'ms');
    
            const deduplicationStart = Date.now();
            const allResults = searchResults.results.flatMap(result => result.results);
            console.log('[Performance] Total results:', allResults.length);
    
            const uniqueResults = this.deduplicateResults(allResults);
            console.log('[Performance] Unique results:', uniqueResults.length);
    
            const sortedResults = uniqueResults.sort((a, b) => b.relevanceScore - a.relevanceScore);
            timings.deduplication = Date.now() - deduplicationStart;
            console.log('[Performance] Deduplication complete. Duration:', timings.deduplication, 'ms');
    
            // Process top results first
            console.log('[Performance] Processing top 5 results...');
            const topProcessingStart = Date.now();
            const topResults = sortedResults.slice(0, 5);
            await Promise.all(topResults.map(r => {
                console.log('[Performance] Processing URL:', r.url);
                return session.processUrl(r.url);
            }));
            timings.topResultsProcessing = Date.now() - topProcessingStart;
            console.log('[Performance] Top results processing complete. Duration:', timings.topResultsProcessing, 'ms');
    
            // Process remaining results
            console.log('[Performance] Processing remaining results...');
            const remainingProcessingStart = Date.now();
            const remainingResults = sortedResults.slice(5);
            await Promise.all(remainingResults.map(r => {
                console.log('[Performance] Processing URL:', r.url);
                return session.processUrl(r.url);
            }));
            timings.remainingResultsProcessing = Date.now() - remainingProcessingStart;
            console.log('[Performance] Remaining results processing complete. Duration:', timings.remainingResultsProcessing, 'ms');
    
            // Complete the session
            console.log('[Performance] Completing session...');
            await session.complete();
    
            // Format and return results
            console.log('[Performance] Formatting results...');
            const results = this.formatResults(session);
            
            // Add timing information
            timings.total = Date.now() - startTime;
            results.timing.operations = {
                parallelSearch: timings.parallelSearch,
                deduplication: timings.deduplication,
                topResultsProcessing: timings.topResultsProcessing,
                remainingResultsProcessing: timings.remainingResultsProcessing,
                total: timings.total
            };
    
            console.log('[Performance] Research complete. Total duration:', timings.total, 'ms');
            console.log('[Performance] Operation timings:', timings);
    
            return results;
        } catch (error) {
            console.error(`[Performance] Error in research session ${session.id}:`, error);
            throw error;
        } finally {
            // Cleanup
            this.activeSessions.delete(session.id);
            await this.parallelSearch.cleanup();
        }
    }
  • src/index.ts:87-124 (registration)
    Registration of the 'deep_research' tool in the MCP server's listTools handler, including name, description, and detailed input schema.
    {
        name: 'deep_research',
        description: 'Perform deep research on a topic with content extraction and analysis',
        inputSchema: {
            type: 'object',
            properties: {
                topic: {
                    type: 'string',
                    description: 'Research topic or question'
                },
                maxDepth: {
                    type: 'number',
                    description: 'Maximum depth of related content exploration',
                    minimum: 1,
                    maximum: 2
                },
                maxBranching: {
                    type: 'number',
                    description: 'Maximum number of related paths to explore',
                    minimum: 1,
                    maximum: 3
                },
                timeout: {
                    type: 'number',
                    description: 'Research timeout in milliseconds',
                    minimum: 30000,
                    maximum: 55000
                },
                minRelevanceScore: {
                    type: 'number',
                    description: 'Minimum relevance score for including content',
                    minimum: 0,
                    maximum: 1
                }
            },
            required: ['topic']
        }
    },
  • TypeScript interface defining the input arguments for the deep_research tool, used in the tool handler.
    interface DeepResearchArgs {
        topic: string;
        maxDepth?: number;
        maxBranching?: number;
        timeout?: number;
        minRelevanceScore?: number;
    }
  • MCP tool dispatcher case for 'deep_research': validates input, calls the DeepResearch instance's startResearch method, and returns formatted results.
    case 'deep_research': {
        const args = request.params.arguments as unknown as DeepResearchArgs;
        if (!args?.topic) {
            throw new McpError(ErrorCode.InvalidParams, 'Topic is required');
        }
    
        console.log(`Starting deep research on topic: ${args.topic}`);
        const result = await deepResearch.startResearch(args.topic, {
            maxDepth: Math.min(args.maxDepth || 2, 2),
            maxBranching: Math.min(args.maxBranching || 3, 3),
            timeout: Math.min(args.timeout || 55000, 55000),
            minRelevanceScore: args.minRelevanceScore || 0.7
        });
    
        return {
            content: [
                {
                    type: 'text',
                    text: JSON.stringify(result, null, 2)
                }
            ]
        };
    }
  • Output type definition for the research results returned by the deep_research handler.
    export interface ResearchResult {
        sessionId: string;
        topic: string;
        findings: {
            mainTopics: Array<{
                name: string;
                importance: number;
                relatedTopics: string[];
            }>;
            keyInsights: Array<{
                text: string;
                confidence: number;
                relatedTopics: string[];
            }>;
            sources: Array<{
                url: string;
                title: string;
                credibilityScore: number;
            }>;
        };
        progress: {
            completedSteps: number;
            totalSteps: number;
            processedUrls: number;
        };
        timing: {
            started: string;
            completed?: string;
            duration?: number;
            operations?: {
                parallelSearch?: number;
                deduplication?: number;
                topResultsProcessing?: number;
                remainingResultsProcessing?: number;
                total?: number;
            };
        };
    }
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure but offers minimal information. It mentions 'deep research' with 'content extraction and analysis', hinting at a potentially resource-intensive or iterative process, but fails to detail critical aspects like execution time, rate limits, authentication needs, output format, or error handling. This leaves significant gaps for a tool with 5 parameters and no output schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action ('perform deep research') and key features ('content extraction and analysis') without any wasted words. It is appropriately sized for the tool's complexity, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, no annotations, no output schema), the description is incomplete. It lacks details on the research methodology, output format, error conditions, or performance characteristics, which are crucial for an agent to use it effectively. The high parameter count and absence of output schema demand more contextual information than provided.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, meaning all parameters are documented in the input schema with descriptions and constraints (e.g., 'maxDepth' with min/max 1-2). The description adds no additional parameter semantics beyond implying a research process, so it meets the baseline of 3 without compensating or detracting from the schema's coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('perform deep research') and key activities ('content extraction and analysis'), which distinguishes it from sibling tools like 'parallel_search' and 'visit_page' that likely have different scopes. However, it doesn't explicitly differentiate itself from those siblings in terms of depth or methodology, keeping it from a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'parallel_search' or 'visit_page', nor does it mention prerequisites, constraints, or typical use cases. It lacks explicit when/when-not instructions or comparisons, leaving the agent to infer usage from the tool name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/PedroDnT/mcp-DEEPwebresearch'

If you have feedback or need assistance with the MCP directory API, please join our Discord server