Skip to main content
Glama
njlnaet
by njlnaet

Validate CoderSwap Search Quality

coderswap_validate_search

Test search quality and coverage by running validation queries to verify knowledge base search performance and identify gaps in content retrieval.

Instructions

Run validation queries to test search quality and coverage (non-DSL quality check)

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
project_idYes
test_queriesNo
run_full_suiteNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
queries_testedYes
average_top_scoreYes
zero_result_queriesYes

Implementation Reference

  • MCP tool handler for 'coderswap_validate_search'. Invokes CoderSwapClient.testSearchQuality and generates a formatted search quality report with aggregate metrics.
    async ({ project_id, test_queries, run_full_suite = false }) => {
      try {
        log('debug', 'Testing search quality', { project_id, run_full_suite })
        const report = await client.testSearchQuality({ project_id, test_queries, run_full_suite })
    
        const output = {
          queries_tested: report.aggregate.queries_tested,
          average_top_score: report.aggregate.average_top_score,
          zero_result_queries: report.aggregate.zero_result_queries
        }
    
        const avgScore = (report.aggregate.average_top_score * 100).toFixed(1)
        const zeroResults = report.aggregate.zero_result_queries.length
    
        let summary = `Search Quality Report\n${'='.repeat(40)}\n`
        summary += `Queries tested: ${report.aggregate.queries_tested}\n`
        summary += `Average top score: ${avgScore}%\n`
        summary += `Zero-result queries: ${zeroResults}\n\n`
    
        if (report.results.length > 0) {
          summary += 'Top Results:\n'
          report.results.slice(0, 3).forEach(r => {
            const score = (r.topScore * 100).toFixed(1)
            summary += `  • "${r.query}" → ${score}% (${r.count} results)\n`
          })
        }
    
        log('info', `Search quality test completed: ${report.aggregate.queries_tested} queries`)
    
        return {
          content: [{
            type: 'text',
            text: summary
          }],
          structuredContent: output
        }
      } catch (error) {
        log('error', 'Search quality test failed', { project_id, error: error instanceof Error ? error.message : error })
        return {
          content: [{
            type: 'text',
            text: `✗ Search quality test failed: ${error instanceof Error ? error.message : 'Unknown error'}`
          }],
          isError: true
        }
      }
    }
  • Core implementation of search validation logic in CoderSwapClient. Uses predefined test queries for full suite or custom queries, performs hybrid searches, sorts by score, and computes aggregate statistics including average top score and zero-result queries.
    async testSearchQuality(input: TestSearchQualityInput) {
      const queries = input.run_full_suite
        ? [
            'what is hybrid search',
            'how to implement rag',
            'error troubleshooting vector search',
            'bm25 algorithm',
            'semantic vs keyword search'
          ]
        : input.test_queries || []
    
      const uniqueQueries = Array.from(new Set(queries))
      if (uniqueQueries.length === 0) {
        throw new Error('No queries provided for search quality test')
      }
    
      const results = [] as Array<{ query: string; topScore: number; count: number; items: SearchResult[] }>
      for (const query of uniqueQueries) {
        const response = await this.search({ project_id: input.project_id, query })
        const sorted = response.results.sort((a, b) => (b.score ?? 0) - (a.score ?? 0))
        results.push({
          query,
          topScore: sorted[0]?.score ?? 0,
          count: sorted.length,
          items: sorted
        })
      }
    
      const aggregate = {
        queries_tested: results.length,
        average_top_score:
          results.reduce((sum, item) => sum + (item.topScore || 0), 0) / Math.max(results.length, 1),
        zero_result_queries: results.filter((item) => item.count === 0).map((item) => item.query)
      }
    
      return { aggregate, results }
    }
  • Input and output schema definitions for the 'coderswap_validate_search' MCP tool using Zod.
    {
      title: 'Validate CoderSwap Search Quality',
      description: 'Run validation queries to test search quality and coverage (non-DSL quality check)',
      inputSchema: {
        project_id: z.string().min(1, 'project_id is required'),
        test_queries: z.array(z.string()).optional(),
        run_full_suite: z.boolean().default(false)
      },
      outputSchema: {
        queries_tested: z.number(),
        average_top_score: z.number(),
        zero_result_queries: z.array(z.string())
      }
  • src/index.ts:568-631 (registration)
    Registration of the 'coderswap_validate_search' tool with the MCP server.
    server.registerTool(
      'coderswap_validate_search',
      {
        title: 'Validate CoderSwap Search Quality',
        description: 'Run validation queries to test search quality and coverage (non-DSL quality check)',
        inputSchema: {
          project_id: z.string().min(1, 'project_id is required'),
          test_queries: z.array(z.string()).optional(),
          run_full_suite: z.boolean().default(false)
        },
        outputSchema: {
          queries_tested: z.number(),
          average_top_score: z.number(),
          zero_result_queries: z.array(z.string())
        }
      },
      async ({ project_id, test_queries, run_full_suite = false }) => {
        try {
          log('debug', 'Testing search quality', { project_id, run_full_suite })
          const report = await client.testSearchQuality({ project_id, test_queries, run_full_suite })
    
          const output = {
            queries_tested: report.aggregate.queries_tested,
            average_top_score: report.aggregate.average_top_score,
            zero_result_queries: report.aggregate.zero_result_queries
          }
    
          const avgScore = (report.aggregate.average_top_score * 100).toFixed(1)
          const zeroResults = report.aggregate.zero_result_queries.length
    
          let summary = `Search Quality Report\n${'='.repeat(40)}\n`
          summary += `Queries tested: ${report.aggregate.queries_tested}\n`
          summary += `Average top score: ${avgScore}%\n`
          summary += `Zero-result queries: ${zeroResults}\n\n`
    
          if (report.results.length > 0) {
            summary += 'Top Results:\n'
            report.results.slice(0, 3).forEach(r => {
              const score = (r.topScore * 100).toFixed(1)
              summary += `  • "${r.query}" → ${score}% (${r.count} results)\n`
            })
          }
    
          log('info', `Search quality test completed: ${report.aggregate.queries_tested} queries`)
    
          return {
            content: [{
              type: 'text',
              text: summary
            }],
            structuredContent: output
          }
        } catch (error) {
          log('error', 'Search quality test failed', { project_id, error: error instanceof Error ? error.message : error })
          return {
            content: [{
              type: 'text',
              text: `✗ Search quality test failed: ${error instanceof Error ? error.message : 'Unknown error'}`
            }],
            isError: true
          }
        }
      }
    )
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions 'non-DSL quality check', which hints this might be a read-only or diagnostic operation, but it doesn't clarify if it's destructive, requires specific permissions, has rate limits, or what the validation entails. For a tool with zero annotation coverage, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise and front-loaded in a single sentence: 'Run validation queries to test search quality and coverage (non-DSL quality check)'. Every word earns its place by conveying purpose and scope without redundancy, making it efficient for an agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (which reduces the need to describe return values), 3 parameters with 0% schema coverage, and no annotations, the description is moderately complete. It covers the core purpose and hints at behavior, but lacks details on usage guidelines, parameter semantics, and behavioral traits, leaving gaps that could hinder effective tool invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate for undocumented parameters. It adds some meaning by implying 'validation queries' relate to 'test_queries' and 'search quality' relates to 'project_id', but it doesn't explain what 'run_full_suite' does or provide details on query formats or project context. With 3 parameters and low coverage, the description offers marginal value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Run validation queries to test search quality and coverage' with the specific verb 'run' and resource 'validation queries', and it distinguishes this from regular search operations by specifying it's a 'non-DSL quality check'. However, it doesn't explicitly differentiate from sibling tools like coderswap_search or coderswap_research_ingest, which keeps it from a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides minimal guidance: it implies this tool is for testing rather than production use through 'test search quality', but it doesn't specify when to use this versus alternatives like coderswap_search or coderswap_research_ingest, nor does it mention prerequisites or exclusions. This leaves the agent with insufficient context for optimal tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/njlnaet/mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server