Skip to main content
Glama
little2512

Word Document Reader MCP Server

by little2512

search_documents

Search Word documents by keywords to find relevant content across document types using full-text indexing with support for both English and Chinese queries.

Instructions

全文索引搜索,支持中英文混合搜索

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes搜索关键词
documentTypeNo限制搜索的文档类型
limitNo返回结果数量限制

Implementation Reference

  • Executes the search_documents tool: parses args, calls documentIndexer.search(query), filters by documentType, limits results, formats and returns search results.
    case "search_documents": {
      const { query, documentType, limit = 10 } = args;
    
      const searchResults = documentIndexer.search(query);
    
      // 按文档类型过滤
      const filteredResults = documentType
        ? searchResults.filter(result => result.document.documentType === documentType)
        : searchResults;
    
      const limitedResults = filteredResults.slice(0, limit);
    
      if (limitedResults.length === 0) {
        return {
          content: [
            {
              type: "text",
              text: `未找到包含关键词 "${query}" 的文档`
            }
          ]
        };
      }
    
      const resultsText = limitedResults.map((result, index) => {
        const doc = result.document;
        return `${index + 1}. 相关度: ${result.score}\n   内存键: ${doc.memoryKey}\n   文件: ${doc.filePath}\n   类型: ${doc.documentType}\n   表格数: ${doc.tablesCount}\n   图片数: ${doc.imagesCount}\n   最后索引: ${new Date(doc.lastIndexed).toLocaleString()}`;
      }).join('\n\n');
    
      return {
        content: [
          {
            type: "text",
            text: `搜索结果 "${query}" (找到 ${limitedResults.length} 个匹配,共 ${filteredResults.length} 个):\n\n${resultsText}`
          }
        ]
      };
    }
  • server.js:518-541 (registration)
    Registers the search_documents tool in the ListTools response, including name, description, and input schema.
    {
      name: "search_documents",
      description: "全文索引搜索,支持中英文混合搜索",
      inputSchema: {
        type: "object",
        properties: {
          query: {
            type: "string",
            description: "搜索关键词"
          },
          documentType: {
            type: "string",
            description: "限制搜索的文档类型",
            enum: ["ui-component", "api-doc", "common-doc", "other"]
          },
          limit: {
            type: "number",
            description: "返回结果数量限制",
            default: 10
          }
        },
        required: ["query"]
      }
    },
  • Core search logic in DocumentIndexer: extracts words from query, computes relevance scores using inverted index, sorts and returns results with metadata.
    search(query) {
      const queryWords = this.extractWords(query.toLowerCase());
      const documentScores = new Map();
    
      queryWords.forEach(word => {
        const docs = this.index.get(word);
        if (docs) {
          docs.forEach(docId => {
            const score = documentScores.get(docId) || 0;
            documentScores.set(docId, score + 1);
          });
        }
      });
    
      // 按相关性排序
      const results = Array.from(documentScores.entries())
        .sort((a, b) => b[1] - a[1])
        .map(([docId, score]) => ({
          documentId,
          score,
          document: this.documents.get(docId)
        }));
    
      return results;
    }
  • Helper method used by search: tokenizes text into Chinese n-grams (2-4 chars), lowercase English words, and numbers.
    extractWords(text) {
      // 中英文分词
      const chinesePattern = /[\u4e00-\u9fff]+/g;
      const englishPattern = /[a-zA-Z]+/g;
      const numberPattern = /\d+/g;
    
      const words = [];
    
      // 提取中文词汇(简单分词,后续可优化为更智能的分词)
      const chineseMatches = text.match(chinesePattern) || [];
      chineseMatches.forEach(match => {
        // 简单的中文二字、三字、四字词分词
        for (let i = 0; i < match.length; i++) {
          for (let len = 2; len <= 4 && i + len <= match.length; len++) {
            words.push(match.substr(i, len));
          }
        }
      });
    
      // 提取英文单词
      const englishMatches = text.match(englishPattern) || [];
      words.push(...englishMatches.map(word => word.toLowerCase()));
    
      // 提取数字
      const numberMatches = text.match(numberPattern) || [];
      words.push(...numberMatches);
    
      return words.filter(word => word.length > 0);
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions 'full-text index search' and language support, but lacks critical behavioral details: it doesn't disclose whether this is a read-only operation, how results are returned (e.g., pagination, sorting), performance characteristics, or error handling. For a search tool with no annotations, this is inadequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise with a single sentence, which is efficient. However, it's under-specified rather than optimally concise—it could benefit from slightly more detail without becoming verbose. It's front-loaded but lacks depth.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a search tool with 3 parameters, no annotations, and no output schema, the description is incomplete. It doesn't explain what the tool returns (e.g., search results format), how to interpret outputs, or any limitations. This leaves significant gaps for an AI agent to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds no parameter semantics beyond what the input schema provides. The schema has 100% description coverage, clearly documenting 'query', 'documentType', and 'limit' with enums and defaults. The description doesn't explain parameter interactions or provide additional context, so it meets the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool performs 'full-text index search' and supports 'Chinese-English mixed search', which provides a basic purpose. However, it doesn't specify what resource is being searched (documents, files, etc.) or distinguish it from sibling tools like 'list_stored_documents' or 'get_stored_document'. The purpose is somewhat vague about scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention when to prefer this over 'list_stored_documents' (which might list without search) or 'get_stored_document' (which retrieves a specific document), nor does it specify any prerequisites or exclusions for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/little2512/word-doc-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server