search-documents
Retrieve relevant document chunks by searching with keyword arrays using the BM25 algorithm. Specify domains and limit results for precise knowledge extraction.
Instructions
Search documents using BM25 algorithm. Takes keyword arrays and returns relevant document chunks.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| domain | No | Domain to search in (optional, e.g., "company", "customer") | |
| keywords | Yes | Array of keywords to search for (e.g., ["payment", "API", "authentication"]) | |
| topN | No | Maximum number of results to return (default: 10) |
Implementation Reference
- src/index.ts:216-238 (registration)Tool definition and schema registration in ListToolsRequest handlername: 'search-documents', description: 'Search documents using BM25 algorithm. Takes keyword arrays and returns relevant document chunks.', inputSchema: { type: 'object', properties: { keywords: { type: 'array', items: { type: 'string' }, description: 'Array of keywords to search for (e.g., ["payment", "API", "authentication"])' }, domain: { type: 'string', description: 'Domain to search in (optional, e.g., "company", "customer")' }, topN: { type: 'number', description: 'Maximum number of results to return (default: 10)', default: 10 } }, required: ['keywords'] } },
- src/index.ts:316-335 (handler)MCP CallTool dispatch handler for search-documents, delegates to DocumentRepositorycase 'search-documents': { const { keywords, domain, topN } = args as { keywords: string[]; domain?: string; topN?: number; }; const results = await repository.searchDocuments(keywords, { domain, topN, contextWindow: config.chunk.contextWindowSize }); const content: TextContent[] = [{ type: 'text', text: results }]; return { content }; }
- Core implementation of document search using BM25 algorithm in DocumentRepositoryasync searchDocuments( keywords: string[], options: { domain?: string; topN?: number; contextWindow?: number; } = {} ): Promise<string> { this.ensureInitialized(); const { domain, topN = 10, contextWindow = 1 } = options; // 검색할 계산기 선택 let calculator: BM25Calculator | null; if (domain && this.domainCalculators.has(domain)) { calculator = this.domainCalculators.get(domain)!; } else { calculator = this.globalCalculator; } if (!calculator) { return "검색 가능한 문서가 없습니다."; } // 키워드를 정규식 패턴으로 변환 const pattern = keywords .map(keyword => escapeRegExp(keyword.trim())) .filter(keyword => keyword.length > 0) .join("|"); if (!pattern) { return "유효한 검색 키워드가 없습니다."; } // BM25 검색 수행 const results = calculator.calculate(pattern); const topResults = results.slice(0, topN); return this.formatSearchResults(topResults, contextWindow); }
- Initialization of BM25 indexes for search functionalityprivate async buildBM25Indexes(documents: KnowledgeDocument[]): Promise<void> { const domainChunks = new Map<string, DocumentChunk[]>(); const allChunks: DocumentChunk[] = []; documents.forEach(doc => { const chunks = doc.getChunks(); allChunks.push(...chunks); const domain = doc.domainName || 'general'; if (!domainChunks.has(domain)) { domainChunks.set(domain, []); } domainChunks.get(domain)!.push(...chunks); }); // 도메인별 계산기 초기화 (CPU 집약적 작업) domainChunks.forEach((chunks, domain) => { this.domainCalculators.set(domain, new BM25Calculator(chunks)); console.error(`[DEBUG] Created BM25Calculator for domain: ${domain} (${chunks.length} chunks)`); }); // 전역 계산기 초기화 if (allChunks.length > 0) { this.globalCalculator = new BM25Calculator(allChunks); console.error(`[DEBUG] Created global BM25Calculator (${allChunks.length} chunks)`); } }