Skip to main content
Glama

query_documents

Search local documents using keyword and semantic matching to find relevant information from PDF, DOCX, TXT, and Markdown files stored on your device.

Instructions

Search ingested documents. Your query words are matched exactly (keyword search). Your query meaning is matched semantically (vector search). Preserve specific terms from the user. Add context if the query is ambiguous. Results include score (0 = most relevant, higher = less relevant).

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYesSearch query. Include specific terms and add context if needed.
limitNoMaximum number of results to return (default: 10). Recommended: 5 for precision, 10 for balance, 20 for broad exploration.

Implementation Reference

  • The primary handler function executing the query_documents tool. It embeds the input query, performs hybrid semantic and keyword search using the VectorStore, restores source information for raw data files, and returns formatted JSON results as MCP content.
    async handleQueryDocuments(
      args: QueryDocumentsInput
    ): Promise<{ content: [{ type: 'text'; text: string }] }> {
      try {
        // Generate query embedding
        const queryVector = await this.embedder.embed(args.query)
    
        // Hybrid search (vector + BM25 keyword matching)
        const searchResults = await this.vectorStore.search(queryVector, args.query, args.limit || 10)
    
        // Format results with source restoration for raw-data files
        const results: QueryResult[] = searchResults.map((result) => {
          const queryResult: QueryResult = {
            filePath: result.filePath,
            chunkIndex: result.chunkIndex,
            text: result.text,
            score: result.score,
          }
    
          // Restore source for raw-data files (ingested via ingest_data)
          if (isRawDataPath(result.filePath)) {
            const source = extractSourceFromPath(result.filePath)
            if (source) {
              queryResult.source = source
            }
          }
    
          return queryResult
        })
    
        return {
          content: [
            {
              type: 'text',
              text: JSON.stringify(results, null, 2),
            },
          ],
        }
      } catch (error) {
        console.error('Failed to query documents:', error)
        throw error
      }
    }
  • TypeScript interface defining the input parameters for the query_documents tool: query string (required) and optional limit number.
    export interface QueryDocumentsInput {
      /** Natural language query */
      query: string
      /** Number of results to retrieve (default 10) */
      limit?: number
    }
  • TypeScript interface defining the output structure for query_documents results: filePath, chunkIndex, text, score, and optional source.
    export interface QueryResult {
      /** File path */
      filePath: string
      /** Chunk index */
      chunkIndex: number
      /** Text */
      text: string
      /** Similarity score */
      score: number
      /** Original source (only for raw-data files, e.g., URLs ingested via ingest_data) */
      source?: string
    }
  • MCP tool registration in ListTools handler: defines name, detailed description, and JSON schema for input validation.
    {
      name: 'query_documents',
      description:
        'Search ingested documents. Your query words are matched exactly (keyword search). Your query meaning is matched semantically (vector search). Preserve specific terms from the user. Add context if the query is ambiguous. Results include score (0 = most relevant, higher = less relevant).',
      inputSchema: {
        type: 'object',
        properties: {
          query: {
            type: 'string',
            description: 'Search query. Include specific terms and add context if needed.',
          },
          limit: {
            type: 'number',
            description:
              'Maximum number of results to return (default: 10). Recommended: 5 for precision, 10 for balance, 20 for broad exploration.',
          },
        },
        required: ['query'],
      },
  • Tool dispatch in CallToolRequestHandler: routes query_documents calls to the handleQueryDocuments method with type-cast arguments.
    case 'query_documents':
      return await this.handleQueryDocuments(
        request.params.arguments as unknown as QueryDocumentsInput
      )
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses key behavioral traits: the dual search mechanism (keyword and semantic), result scoring (0=most relevant), and the need to handle ambiguous queries. However, it misses details like pagination, error conditions, or performance characteristics (e.g., latency, rate limits), leaving gaps for a search tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with four sentences, each adding value: search functionality, query handling, result scoring. It's front-loaded with the core purpose. However, minor redundancy exists (e.g., query guidance overlaps with schema), and some sentences could be more streamlined (e.g., combining search types).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (search with dual mechanisms), no annotations, and no output schema, the description is partially complete. It covers the search behavior and scoring but lacks details on output format (e.g., what fields are returned beyond score), error handling, or integration context. This leaves the agent with incomplete operational knowledge.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters thoroughly. The description adds minimal value beyond the schema: it echoes the query guidance ('Preserve specific terms... Add context') and implies result relevance scoring, but doesn't provide additional syntax, format, or usage details for parameters. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as searching ingested documents with both keyword and semantic matching. It specifies the resource (documents) and action (search), distinguishing it from sibling tools like list_files (listing) or ingest_file (adding). However, it doesn't explicitly contrast with other search-related tools since none exist among siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage through phrases like 'Preserve specific terms from the user' and 'Add context if the query is ambiguous', suggesting when to use certain query strategies. However, it lacks explicit guidance on when to choose this tool over alternatives (e.g., list_files for browsing vs. query_documents for searching), and no exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/shinpr/mcp-local-rag'

If you have feedback or need assistance with the MCP directory API, please join our Discord server