Skip to main content
Glama
little2512
by little2512

search_documents

Search Word documents by keywords to find relevant content across document types using full-text indexing with support for both English and Chinese queries.

Instructions

全文索引搜索,支持中英文混合搜索

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
queryYes搜索关键词
documentTypeNo限制搜索的文档类型
limitNo返回结果数量限制

Implementation Reference

  • Executes the search_documents tool: parses args, calls documentIndexer.search(query), filters by documentType, limits results, formats and returns search results.
    case "search_documents": { const { query, documentType, limit = 10 } = args; const searchResults = documentIndexer.search(query); // 按文档类型过滤 const filteredResults = documentType ? searchResults.filter(result => result.document.documentType === documentType) : searchResults; const limitedResults = filteredResults.slice(0, limit); if (limitedResults.length === 0) { return { content: [ { type: "text", text: `未找到包含关键词 "${query}" 的文档` } ] }; } const resultsText = limitedResults.map((result, index) => { const doc = result.document; return `${index + 1}. 相关度: ${result.score}\n 内存键: ${doc.memoryKey}\n 文件: ${doc.filePath}\n 类型: ${doc.documentType}\n 表格数: ${doc.tablesCount}\n 图片数: ${doc.imagesCount}\n 最后索引: ${new Date(doc.lastIndexed).toLocaleString()}`; }).join('\n\n'); return { content: [ { type: "text", text: `搜索结果 "${query}" (找到 ${limitedResults.length} 个匹配,共 ${filteredResults.length} 个):\n\n${resultsText}` } ] }; }
  • server.js:518-541 (registration)
    Registers the search_documents tool in the ListTools response, including name, description, and input schema.
    { name: "search_documents", description: "全文索引搜索,支持中英文混合搜索", inputSchema: { type: "object", properties: { query: { type: "string", description: "搜索关键词" }, documentType: { type: "string", description: "限制搜索的文档类型", enum: ["ui-component", "api-doc", "common-doc", "other"] }, limit: { type: "number", description: "返回结果数量限制", default: 10 } }, required: ["query"] } },
  • Core search logic in DocumentIndexer: extracts words from query, computes relevance scores using inverted index, sorts and returns results with metadata.
    search(query) { const queryWords = this.extractWords(query.toLowerCase()); const documentScores = new Map(); queryWords.forEach(word => { const docs = this.index.get(word); if (docs) { docs.forEach(docId => { const score = documentScores.get(docId) || 0; documentScores.set(docId, score + 1); }); } }); // 按相关性排序 const results = Array.from(documentScores.entries()) .sort((a, b) => b[1] - a[1]) .map(([docId, score]) => ({ documentId, score, document: this.documents.get(docId) })); return results; }
  • Helper method used by search: tokenizes text into Chinese n-grams (2-4 chars), lowercase English words, and numbers.
    extractWords(text) { // 中英文分词 const chinesePattern = /[\u4e00-\u9fff]+/g; const englishPattern = /[a-zA-Z]+/g; const numberPattern = /\d+/g; const words = []; // 提取中文词汇(简单分词,后续可优化为更智能的分词) const chineseMatches = text.match(chinesePattern) || []; chineseMatches.forEach(match => { // 简单的中文二字、三字、四字词分词 for (let i = 0; i < match.length; i++) { for (let len = 2; len <= 4 && i + len <= match.length; len++) { words.push(match.substr(i, len)); } } }); // 提取英文单词 const englishMatches = text.match(englishPattern) || []; words.push(...englishMatches.map(word => word.toLowerCase())); // 提取数字 const numberMatches = text.match(numberPattern) || []; words.push(...numberMatches); return words.filter(word => word.length > 0);

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/little2512/word-doc-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server