search_cortex_docs
Search Cortex Cloud documentation to find answers to technical questions and access relevant information.
Instructions
Search Cortex Cloud documentation
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes |
Implementation Reference
- src/main.py:191-195 (handler)The primary handler function for the 'search_cortex_docs' MCP tool. It is registered using the @mcp.tool() decorator and delegates the search logic to the DocumentationIndexer instance for the 'cortex_cloud' site, returning JSON-formatted results.@mcp.tool() async def search_cortex_docs(query: str) -> str: """Search Cortex Cloud documentation""" results = await indexer.search_docs(query, site='cortex_cloud') return json.dumps(results, indent=2)
- src/main.py:102-154 (helper)The core helper method implementing the documentation search logic, including relevance scoring based on title and content matches, snippet extraction, and result ranking/sorting.async def search_docs(self, query: str, site: str = None) -> List[Dict]: """Search indexed documentation""" if not self.cached_pages: return [] query_lower = query.lower() results = [] for url, page in self.cached_pages.items(): # Filter by site if specified if site and page.site != site: continue # Calculate relevance score score = 0 title_lower = page.title.lower() content_lower = page.content.lower() # Higher score for title matches if query_lower in title_lower: score += 10 # Even higher for exact title matches if query_lower == title_lower: score += 20 # Score for content matches content_matches = content_lower.count(query_lower) score += content_matches * 2 # Score for partial word matches in title query_words = query_lower.split() for word in query_words: if word in title_lower: score += 5 if word in content_lower: score += 1 if score > 0: # Extract snippet around first match snippet = self._extract_snippet(page.content, query, max_length=200) results.append({ 'title': page.title, 'url': page.url, 'site': page.site, 'snippet': snippet, 'score': score }) # Sort by relevance score (highest first) and limit results results.sort(key=lambda x: x['score'], reverse=True) return results[:10]
- src/main.py:34-100 (helper)Helper method for indexing (crawling and caching) documentation pages from a specified site, used to populate the cache before searching.async def index_site(self, site_name: str, max_pages: int = 100): """Index documentation from a specific site""" if site_name not in self.base_urls: raise ValueError(f"Unknown site: {site_name}") base_url = self.base_urls[site_name] visited_urls = set() urls_to_visit = [base_url] pages_indexed = 0 async with aiohttp.ClientSession() as session: while urls_to_visit and pages_indexed < max_pages: url = urls_to_visit.pop(0) if url in visited_urls: continue visited_urls.add(url) try: async with session.get(url, timeout=10) as response: if response.status == 200: content = await response.text() soup = BeautifulSoup(content, 'html.parser') # Extract page content title = soup.find('title') title_text = title.text.strip() if title else url # Remove script and style elements for script in soup(["script", "style"]): script.decompose() # Get text content text_content = soup.get_text() lines = (line.strip() for line in text_content.splitlines()) chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) text = ' '.join(chunk for chunk in chunks if chunk) # Store in cache self.cached_pages[url] = CachedPage( title=title_text, content=text[:5000], # Limit content length url=url, site=site_name, timestamp=time.time() ) pages_indexed += 1 # Find more links to index if pages_indexed < max_pages: links = soup.find_all('a', href=True) for link in links: href = link['href'] full_url = urljoin(url, href) # Only index URLs from the same domain if urlparse(full_url).netloc == urlparse(base_url).netloc: if full_url not in visited_urls and full_url not in urls_to_visit: urls_to_visit.append(full_url) except Exception as e: print(f"Error indexing {url}: {e}") continue return pages_indexed
- src/main.py:11-23 (helper)Dataclass representing a cached documentation page with expiration logic, used by the indexer.@dataclass class CachedPage: title: str content: str url: str site: str timestamp: float ttl: float = 3600 # 1 hour default TTL @property def is_expired(self) -> bool: return time.time() > self.timestamp + self.ttl