index_cortex_docs

index_cortex_docs

Index Cortex Cloud documentation to enable search functionality. Call this tool first before querying documentation content.

Instructions

Index Cortex Cloud documentation. Call this first before searching.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`max_pages`	No

Implementation Reference

server.py:209-213 (handler)
The handler function for the 'index_cortex_docs' tool. It is registered using the @mcp.tool() decorator and delegates the indexing work to the DocumentationIndexer.index_site method for the 'cortex_cloud' site.
@mcp.tool() async def index_cortex_docs(max_pages: int = 50) -> str: """Index Cortex Cloud documentation. Call this first before searching.""" pages_indexed = await indexer.index_site('cortex_cloud', max_pages) return f"Indexed {pages_indexed} pages from Cortex Cloud documentation"
server.py:36-102 (helper)
The core helper function that performs the actual indexing: crawls the site starting from base_url, parses HTML, extracts clean text content, limits to max_pages, caches in self.cached_pages using CachedPage dataclass, and follows same-domain links.
async def index_site(self, site_name: str, max_pages: int = 100): """Index documentation from a specific site""" if site_name not in self.base_urls: raise ValueError(f"Unknown site: {site_name}") base_url = self.base_urls[site_name] visited_urls = set() urls_to_visit = [base_url] pages_indexed = 0 async with aiohttp.ClientSession() as session: while urls_to_visit and pages_indexed < max_pages: url = urls_to_visit.pop(0) if url in visited_urls: continue visited_urls.add(url) try: async with session.get(url, timeout=10) as response: if response.status == 200: content = await response.text() soup = BeautifulSoup(content, 'html.parser') # Extract page content title = soup.find('title') title_text = title.text.strip() if title else url # Remove script and style elements for script in soup(["script", "style"]): script.decompose() # Get text content text_content = soup.get_text() lines = (line.strip() for line in text_content.splitlines()) chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) text = ' '.join(chunk for chunk in chunks if chunk) # Store in cache self.cached_pages[url] = CachedPage( title=title_text, content=text[:5000], # Limit content length url=url, site=site_name, timestamp=time.time() ) pages_indexed += 1 # Find more links to index if pages_indexed < max_pages: links = soup.find_all('a', href=True) for link in links: href = link['href'] full_url = urljoin(url, href) # Only index URLs from the same domain if urlparse(full_url).netloc == urlparse(base_url).netloc: if full_url not in visited_urls and full_url not in urls_to_visit: urls_to_visit.append(full_url) except Exception as e: print(f"Error indexing {url}: {e}") continue return pages_indexed
server.py:13-25 (helper)
Dataclass used to cache indexed pages with expiration logic.
@dataclass class CachedPage: title: str content: str url: str site: str timestamp: float ttl: float = 3600 # 1 hour default TTL @property def is_expired(self) -> bool: return time.time() > self.timestamp + self.ttl
server.py:209-209 (registration)
Registration of the index_cortex_docs tool using the FastMCP decorator.
@mcp.tool()

cortex-cloud-docs-mcp-server

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API