index_cortex_docs
Index Cortex Cloud documentation to enable search functionality. Call this tool first before querying documentation content.
Instructions
Index Cortex Cloud documentation. Call this first before searching.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| max_pages | No |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| result | Yes |
Implementation Reference
- server.py:209-213 (handler)The handler function for the 'index_cortex_docs' tool. It is registered using the @mcp.tool() decorator and delegates the indexing work to the DocumentationIndexer.index_site method for the 'cortex_cloud' site.
@mcp.tool() async def index_cortex_docs(max_pages: int = 50) -> str: """Index Cortex Cloud documentation. Call this first before searching.""" pages_indexed = await indexer.index_site('cortex_cloud', max_pages) return f"Indexed {pages_indexed} pages from Cortex Cloud documentation" - server.py:36-102 (helper)The core helper function that performs the actual indexing: crawls the site starting from base_url, parses HTML, extracts clean text content, limits to max_pages, caches in self.cached_pages using CachedPage dataclass, and follows same-domain links.
async def index_site(self, site_name: str, max_pages: int = 100): """Index documentation from a specific site""" if site_name not in self.base_urls: raise ValueError(f"Unknown site: {site_name}") base_url = self.base_urls[site_name] visited_urls = set() urls_to_visit = [base_url] pages_indexed = 0 async with aiohttp.ClientSession() as session: while urls_to_visit and pages_indexed < max_pages: url = urls_to_visit.pop(0) if url in visited_urls: continue visited_urls.add(url) try: async with session.get(url, timeout=10) as response: if response.status == 200: content = await response.text() soup = BeautifulSoup(content, 'html.parser') # Extract page content title = soup.find('title') title_text = title.text.strip() if title else url # Remove script and style elements for script in soup(["script", "style"]): script.decompose() # Get text content text_content = soup.get_text() lines = (line.strip() for line in text_content.splitlines()) chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) text = ' '.join(chunk for chunk in chunks if chunk) # Store in cache self.cached_pages[url] = CachedPage( title=title_text, content=text[:5000], # Limit content length url=url, site=site_name, timestamp=time.time() ) pages_indexed += 1 # Find more links to index if pages_indexed < max_pages: links = soup.find_all('a', href=True) for link in links: href = link['href'] full_url = urljoin(url, href) # Only index URLs from the same domain if urlparse(full_url).netloc == urlparse(base_url).netloc: if full_url not in visited_urls and full_url not in urls_to_visit: urls_to_visit.append(full_url) except Exception as e: print(f"Error indexing {url}: {e}") continue return pages_indexed - server.py:13-25 (helper)Dataclass used to cache indexed pages with expiration logic.
@dataclass class CachedPage: title: str content: str url: str site: str timestamp: float ttl: float = 3600 # 1 hour default TTL @property def is_expired(self) -> bool: return time.time() > self.timestamp + self.ttl - server.py:209-209 (registration)Registration of the index_cortex_docs tool using the FastMCP decorator.
@mcp.tool()