Document Extractor MCP Server

search_documents

Find documents by searching titles and content using full-text queries to retrieve relevant information from stored documentation.

Instructions

Search documents by title or content using full-text search

Input Schema

TableJSON Schema

Name	Required	Description	Default
`query`	Yes	Search query to find documents (searches title and content)
`limit`	No	Maximum number of results to return (default: 50)

Implementation Reference

src/index.js:898-951 (handler)

The tool `search_documents` is registered using `server.tool` and uses `searchDocuments` to fetch the data.

const searchDocumentsTool = server.tool(
  'search_documents',
  'Search documents by title or content using full-text search',
  {
    query: z.string().min(1, 'Query cannot be empty').describe('Search query to find documents (searches title and content)'),
    limit: z.number().min(1).max(100).optional().default(50).describe('Maximum number of results to return (default: 50)')
  },    async ({ query, limit = 50 }) => {
    try {
      // Only authenticate when tool is actually invoked
      await authenticateWhenNeeded();
      
      const result = await searchDocuments(query, limit);
      
      if (result.items.length === 0) {
        return {
          content: [
            {
              type: 'text',
              text: `🔍 No documents found matching "${query}"`
            }
          ]
        };
      }
      
      const searchResults = result.items.map(doc => 
        `**${doc.title}** (ID: ${doc.id})\n` +
        `Source: ${doc.metadata?.source || 'Unknown'}\n` +
        `Domain: ${doc.metadata?.domain || 'Unknown'}\n` +
        `Created: ${new Date(doc.created).toLocaleString()}\n` +
        `${doc.metadata?.url ? `URL: ${doc.metadata.url}\n` : ''}` +
        `Preview: ${doc.content.substring(0, 150)}...\n`
      ).join('\n---\n');
      
      return {
        content: [
          {
            type: 'text',
            text: `🔍 Found ${result.items.length} documents matching "${query}":\n\n${searchResults}`
          }
        ]
      };
    } catch (error) {
      return {
        content: [
          {
            type: 'text',
            text: `❌ Error: ${error.message}`
          }
        ],
        isError: true
      };
    }
  }
);

src/index.js:600-620 (handler)

The `searchDocuments` function is the core logic that performs the actual full-text search against the PocketBase collection.

// Search documents in PocketBase (with lazy initialization)
async function searchDocuments(query, limit = 50) {
  try {
    await authenticateWhenNeeded();
    
    if (!DOCUMENTS_COLLECTION) {
      initializeConfig();
    }
    
    const records = await pb.collection(DOCUMENTS_COLLECTION).getList(1, limit, {
      filter: `title ~ "${query}" || content ~ "${query}"`,
      sort: '-created'
    });
    
    debugLog('Documents searched in PocketBase', { query, count: records.items.length });
    return records;
  } catch (error) {
    debugLog('Error searching documents', { error: error.message });
    throw new Error(`Failed to search documents: ${error.message}`);
  }
}

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Mentions 'full-text search' indicating matching behavior, but lacks crucial behavioral details: result ranking/relevance, case sensitivity, partial vs exact matching, return format structure, or pagination behavior beyond the limit parameter.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single efficient sentence (8 words) with front-loaded action verb. No redundant phrases or unnecessary padding; every word conveys search mechanism, target fields, and method.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple 2-parameter search tool with complete schema coverage. Missing output format description (relevant since no output schema exists), but tool name and 'search' verb sufficiently imply list return. Could benefit from mentioning result ranking or snippet inclusion.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, establishing baseline 3. Description reinforces that query searches both 'title or content' (aligning with schema) and adds 'full-text search' context about query interpretation, but does not add syntax examples, query operators, or explain the default limit behavior beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

States specific verb ('Search') and resource ('documents') with scope ('by title or content'). Mentions 'full-text search' mechanism, implying filtering capability that distinguishes it from list_documents (enumeration) and get_document (ID-based retrieval), though lacks explicit sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides no guidance on when to use this versus list_documents (all documents) or get_document (specific ID lookup). Missing explicit when-to-use criteria or prerequisites like query syntax requirements.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/DynamicEndpoints/documentation-mcp-using-pocketbase'

If you have feedback or need assistance with the MCP directory API, please join our Discord server