Skip to main content
Glama
little2512

Word Document Reader MCP Server

by little2512

search_documents

Search Word documents by keywords to find relevant content across document types using full-text indexing with support for both English and Chinese queries.

Instructions

全文索引搜索,支持中英文混合搜索

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes搜索关键词
documentTypeNo限制搜索的文档类型
limitNo返回结果数量限制

Implementation Reference

  • Executes the search_documents tool: parses args, calls documentIndexer.search(query), filters by documentType, limits results, formats and returns search results.
    case "search_documents": {
      const { query, documentType, limit = 10 } = args;
    
      const searchResults = documentIndexer.search(query);
    
      // 按文档类型过滤
      const filteredResults = documentType
        ? searchResults.filter(result => result.document.documentType === documentType)
        : searchResults;
    
      const limitedResults = filteredResults.slice(0, limit);
    
      if (limitedResults.length === 0) {
        return {
          content: [
            {
              type: "text",
              text: `未找到包含关键词 "${query}" 的文档`
            }
          ]
        };
      }
    
      const resultsText = limitedResults.map((result, index) => {
        const doc = result.document;
        return `${index + 1}. 相关度: ${result.score}\n   内存键: ${doc.memoryKey}\n   文件: ${doc.filePath}\n   类型: ${doc.documentType}\n   表格数: ${doc.tablesCount}\n   图片数: ${doc.imagesCount}\n   最后索引: ${new Date(doc.lastIndexed).toLocaleString()}`;
      }).join('\n\n');
    
      return {
        content: [
          {
            type: "text",
            text: `搜索结果 "${query}" (找到 ${limitedResults.length} 个匹配,共 ${filteredResults.length} 个):\n\n${resultsText}`
          }
        ]
      };
    }
  • server.js:518-541 (registration)
    Registers the search_documents tool in the ListTools response, including name, description, and input schema.
    {
      name: "search_documents",
      description: "全文索引搜索,支持中英文混合搜索",
      inputSchema: {
        type: "object",
        properties: {
          query: {
            type: "string",
            description: "搜索关键词"
          },
          documentType: {
            type: "string",
            description: "限制搜索的文档类型",
            enum: ["ui-component", "api-doc", "common-doc", "other"]
          },
          limit: {
            type: "number",
            description: "返回结果数量限制",
            default: 10
          }
        },
        required: ["query"]
      }
    },
  • Core search logic in DocumentIndexer: extracts words from query, computes relevance scores using inverted index, sorts and returns results with metadata.
    search(query) {
      const queryWords = this.extractWords(query.toLowerCase());
      const documentScores = new Map();
    
      queryWords.forEach(word => {
        const docs = this.index.get(word);
        if (docs) {
          docs.forEach(docId => {
            const score = documentScores.get(docId) || 0;
            documentScores.set(docId, score + 1);
          });
        }
      });
    
      // 按相关性排序
      const results = Array.from(documentScores.entries())
        .sort((a, b) => b[1] - a[1])
        .map(([docId, score]) => ({
          documentId,
          score,
          document: this.documents.get(docId)
        }));
    
      return results;
    }
  • Helper method used by search: tokenizes text into Chinese n-grams (2-4 chars), lowercase English words, and numbers.
    extractWords(text) {
      // 中英文分词
      const chinesePattern = /[\u4e00-\u9fff]+/g;
      const englishPattern = /[a-zA-Z]+/g;
      const numberPattern = /\d+/g;
    
      const words = [];
    
      // 提取中文词汇(简单分词,后续可优化为更智能的分词)
      const chineseMatches = text.match(chinesePattern) || [];
      chineseMatches.forEach(match => {
        // 简单的中文二字、三字、四字词分词
        for (let i = 0; i < match.length; i++) {
          for (let len = 2; len <= 4 && i + len <= match.length; len++) {
            words.push(match.substr(i, len));
          }
        }
      });
    
      // 提取英文单词
      const englishMatches = text.match(englishPattern) || [];
      words.push(...englishMatches.map(word => word.toLowerCase()));
    
      // 提取数字
      const numberMatches = text.match(numberPattern) || [];
      words.push(...numberMatches);
    
      return words.filter(word => word.length > 0);

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/little2512/word-doc-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server