Skip to main content
Glama
evalops

Deep Code Reasoning MCP Server

by evalops

run_hypothesis_tournament

Conduct competitive hypothesis tournaments to identify root causes by testing multiple theories in parallel. Uses evidence-based scoring and elimination rounds for efficient issue resolution.

Instructions

Run a competitive hypothesis tournament to find root causes. Multiple AI conversations test different theories in parallel, with evidence-based scoring and elimination rounds.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
claude_contextYes
issueYesDescription of the issue to investigate
tournament_configNo

Implementation Reference

  • MCP tool dispatch handler for 'run_hypothesis_tournament': parses input using schema, validates context, constructs config, calls DeepCodeReasonerV2.runHypothesisTournament, and formats result as MCP response.
    case 'run_hypothesis_tournament': {
      const parsed = RunHypothesisTournamentSchema.parse(args);
    
      // Validate and sanitize the Claude context
      const validatedContext = InputValidator.validateClaudeContext(parsed.claude_context);
    
      // Override with specific values from the parsed input
      const context: ClaudeCodeContext = {
        ...validatedContext,
        analysisBudgetRemaining: 300, // 5 minutes for tournament
      };
    
      const tournamentConfig = {
        maxHypotheses: parsed.tournament_config?.max_hypotheses ?? 6,
        maxRounds: parsed.tournament_config?.max_rounds ?? 3,
        parallelSessions: parsed.tournament_config?.parallel_sessions ?? 4,
      };
    
      const result = await deepReasoner.runHypothesisTournament(
        context,
        InputValidator.validateString(parsed.issue, 1000),
        tournamentConfig,
      );
    
      return {
        content: [
          {
            type: 'text',
            text: JSON.stringify(result, null, 2),
          },
        ],
      };
    }
  • DeepCodeReasonerV2 handler method: creates or uses HypothesisTournamentService instance and delegates to its runTournament method, with error handling.
    async runHypothesisTournament(
      context: ClaudeCodeContext,
      issue: string,
      tournamentConfig?: {
        maxHypotheses?: number;
        maxRounds?: number;
        parallelSessions?: number;
      },
    ): Promise<TournamentResult> {
      try {
        // Override tournament config if provided
        const tournament = tournamentConfig
          ? new HypothesisTournamentService(
              this.geminiApiKey,
              tournamentConfig,
            )
          : this.tournamentService;
    
        // Run the tournament
        const result = await tournament.runTournament(context, issue);
    
        return result;
      } catch (error) {
        console.error('Hypothesis tournament failed:', {
          error,
          issue,
          tournamentConfig,
          contextFiles: context.focusArea.files,
          entryPoints: context.focusArea.entryPoints,
        });
        throw error;
      }
    }
  • Core tournament execution logic: generates hypotheses, runs elimination rounds in parallel, computes winner, extracts findings and recommendations.
    async runTournament(
      context: ClaudeCodeContext,
      issue: string,
    ): Promise<TournamentResult> {
      const startTime = Date.now();
      
      // Generate initial hypotheses
      const hypotheses = await this.generateHypotheses(context, issue);
      
      const rounds: TournamentRound[] = [];
      let remainingHypotheses = [...hypotheses];
      let allFindings: Finding[] = [];
    
      // Run tournament rounds
      for (let roundNum = 1; roundNum <= this.config.maxRounds && remainingHypotheses.length > 1; roundNum++) {
        const round = await this.runRound(
          roundNum,
          remainingHypotheses,
          context,
          issue,
          rounds,
        );
        
        rounds.push(round);
        allFindings.push(...this.extractFindingsFromRound(round));
        
        // Eliminate low-confidence hypotheses
        remainingHypotheses = round.results
          .filter(r => r.overallConfidence >= this.config.eliminationThreshold)
          .sort((a, b) => b.overallConfidence - a.overallConfidence)
          .slice(0, Math.ceil(remainingHypotheses.length / 2))
          .map(r => r.hypothesis);
        
        // Share insights across sessions if enabled
        if (this.config.crossPollinationEnabled && remainingHypotheses.length > 1) {
          await this.crossPollinateInsights(round.results);
        }
      }
    
      // Determine winner and runner-up
      const finalResults = rounds[rounds.length - 1]?.results || [];
      const sortedResults = finalResults.sort((a, b) => b.overallConfidence - a.overallConfidence);
      
      const winner = sortedResults[0];
      const runnerUp = sortedResults[1];
    
      // Calculate metrics
      const duration = Date.now() - startTime;
      const sequentialTime = hypotheses.length * (duration / rounds.length);
      const parallelEfficiency = sequentialTime / duration;
    
      return {
        issue,
        totalHypotheses: hypotheses.length,
        rounds,
        winner,
        runnerUp,
        allFindings,
        recommendations: this.generateRecommendations(winner, runnerUp, allFindings),
        duration,
        parallelEfficiency,
      };
    }
  • Zod schema for validating run_hypothesis_tournament tool input parameters.
    const RunHypothesisTournamentSchema = z.object({
      claude_context: z.object({
        attempted_approaches: z.array(z.string()),
        partial_findings: z.array(z.any()),
        stuck_description: z.string(),
        code_scope: z.object({
          files: z.array(z.string()),
          entry_points: z.array(z.any()).optional(),
          service_names: z.array(z.string()).optional(),
        }),
      }),
      issue: z.string(),
      tournament_config: z.object({
        max_hypotheses: z.number().min(2).max(20).optional(),
        max_rounds: z.number().min(1).max(5).optional(),
        parallel_sessions: z.number().min(1).max(10).optional(),
      }).optional(),
    });
  • src/index.ts:418-492 (registration)
    Tool registration in ListTools response: defines name, description, and inputSchema for MCP tool discovery.
      name: 'run_hypothesis_tournament',
      description: 'Run a competitive hypothesis tournament to find root causes. Multiple AI conversations test different theories in parallel, with evidence-based scoring and elimination rounds.',
      inputSchema: {
        type: 'object',
        properties: {
          claude_context: {
            type: 'object',
            properties: {
              attempted_approaches: {
                type: 'array',
                items: { type: 'string' },
                description: 'What Claude Code already tried',
              },
              partial_findings: {
                type: 'array',
                description: 'Any findings Claude Code discovered',
              },
              stuck_description: {
                type: 'string',
                description: 'Description of where Claude Code got stuck',
              },
              code_scope: {
                type: 'object',
                properties: {
                  files: {
                    type: 'array',
                    items: { type: 'string' },
                    description: 'Files to analyze',
                  },
                  entry_points: {
                    type: 'array',
                    description: 'Specific functions/methods to start from',
                  },
                  service_names: {
                    type: 'array',
                    items: { type: 'string' },
                    description: 'Services involved in cross-system analysis',
                  },
                },
                required: ['files'],
              },
            },
            required: ['attempted_approaches', 'partial_findings', 'stuck_description', 'code_scope'],
          },
          issue: {
            type: 'string',
            description: 'Description of the issue to investigate',
          },
          tournament_config: {
            type: 'object',
            properties: {
              max_hypotheses: {
                type: 'number',
                minimum: 2,
                maximum: 20,
                description: 'Number of initial hypotheses to generate (default: 6)',
              },
              max_rounds: {
                type: 'number',
                minimum: 1,
                maximum: 5,
                description: 'Maximum tournament rounds (default: 3)',
              },
              parallel_sessions: {
                type: 'number',
                minimum: 1,
                maximum: 10,
                description: 'Max concurrent conversations (default: 4)',
              },
            },
          },
        },
        required: ['claude_context', 'issue'],
      },
    },
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes the process ('competitive tournament', 'parallel testing', 'evidence-based scoring', 'elimination rounds'), which gives some insight into the tool's behavior. However, it lacks critical details such as whether this is a read-only or mutative operation, expected runtime, error handling, or output format. For a complex tool with nested inputs, this is insufficient to inform an agent fully.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured in a single sentence. It front-loads the core action ('Run a competitive hypothesis tournament') and efficiently adds key details ('to find root causes', 'Multiple AI conversations test different theories in parallel, with evidence-based scoring and elimination rounds'). Every phrase contributes meaning without redundancy, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (3 parameters with nested objects, no annotations, no output schema), the description is incomplete. It explains the high-level process but misses crucial context: what the output looks like (e.g., a winning hypothesis, scores, logs), how errors are handled, performance implications (e.g., resource-intensive due to parallel sessions), or integration with sibling tools. This leaves significant gaps for an agent to understand the tool's full behavior and outcomes.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is low (33%), but the description adds minimal value beyond the schema. It doesn't explain the meaning or purpose of parameters like 'claude_context' or 'tournament_config', which are complex nested objects. The schema provides descriptions for sub-properties (e.g., 'attempted_approaches', 'max_hypotheses'), but the overall tool description doesn't clarify how these inputs drive the tournament process. Baseline 3 is appropriate as the schema does some work, but the description doesn't compensate for the coverage gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Run a competitive hypothesis tournament to find root causes.' It specifies the verb ('run'), resource ('hypothesis tournament'), and goal ('find root causes'), distinguishing it from siblings like 'hypothesis_test' (singular testing) or 'escalate_analysis' (escalation). However, it doesn't explicitly differentiate from all siblings, such as 'cross_system_impact' or 'trace_execution_path', which might also involve root cause analysis.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions the method ('Multiple AI conversations test different theories in parallel') but doesn't specify scenarios, prerequisites, or exclusions. For example, it doesn't indicate if this is for complex issues where other tools failed or when simpler tools like 'hypothesis_test' might suffice. This lack of context makes it hard for an agent to choose appropriately among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/evalops/deep-code-reasoning-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server