cortex-cloud-docs-mcp-server

index_cortex_docs

Index Cortex Cloud documentation to enable search functionality. Call this tool first before querying documentation content.

Instructions

Index Cortex Cloud documentation. Call this first before searching.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`max_pages`	No

Output Schema

TableJSON Schema

Name	Required	Description	Default
`result`	Yes

Implementation Reference

server.py:209-213 (handler)

The handler function for the 'index_cortex_docs' tool. It is registered using the @mcp.tool() decorator and delegates the indexing work to the DocumentationIndexer.index_site method for the 'cortex_cloud' site.

@mcp.tool()
async def index_cortex_docs(max_pages: int = 50) -> str:
    """Index Cortex Cloud documentation. Call this first before searching."""
    pages_indexed = await indexer.index_site('cortex_cloud', max_pages)
    return f"Indexed {pages_indexed} pages from Cortex Cloud documentation"

server.py:36-102 (helper)

The core helper function that performs the actual indexing: crawls the site starting from base_url, parses HTML, extracts clean text content, limits to max_pages, caches in self.cached_pages using CachedPage dataclass, and follows same-domain links.

async def index_site(self, site_name: str, max_pages: int = 100):
    """Index documentation from a specific site"""
    if site_name not in self.base_urls:
        raise ValueError(f"Unknown site: {site_name}")
    
    base_url = self.base_urls[site_name]
    visited_urls = set()
    urls_to_visit = [base_url]
    pages_indexed = 0
    
    async with aiohttp.ClientSession() as session:
        while urls_to_visit and pages_indexed < max_pages:
            url = urls_to_visit.pop(0)
            
            if url in visited_urls:
                continue
                
            visited_urls.add(url)
            
            try:
                async with session.get(url, timeout=10) as response:
                    if response.status == 200:
                        content = await response.text()
                        soup = BeautifulSoup(content, 'html.parser')
                        
                        # Extract page content
                        title = soup.find('title')
                        title_text = title.text.strip() if title else url
                        
                        # Remove script and style elements
                        for script in soup(["script", "style"]):
                            script.decompose()
                        
                        # Get text content
                        text_content = soup.get_text()
                        lines = (line.strip() for line in text_content.splitlines())
                        chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
                        text = ' '.join(chunk for chunk in chunks if chunk)
                        
                        # Store in cache
                        self.cached_pages[url] = CachedPage(
                            title=title_text,
                            content=text[:5000],  # Limit content length
                            url=url,
                            site=site_name,
                            timestamp=time.time()
                        )
                        
                        pages_indexed += 1
                        
                        # Find more links to index
                        if pages_indexed < max_pages:
                            links = soup.find_all('a', href=True)
                            for link in links:
                                href = link['href']
                                full_url = urljoin(url, href)
                                
                                # Only index URLs from the same domain
                                if urlparse(full_url).netloc == urlparse(base_url).netloc:
                                    if full_url not in visited_urls and full_url not in urls_to_visit:
                                        urls_to_visit.append(full_url)
                            
            except Exception as e:
                print(f"Error indexing {url}: {e}")
                continue
    
    return pages_indexed

server.py:13-25 (helper)

Dataclass used to cache indexed pages with expiration logic.

@dataclass
class CachedPage:
    title: str
    content: str
    url: str
    site: str
    timestamp: float
    ttl: float = 3600  # 1 hour default TTL
    
    @property
    def is_expired(self) -> bool:
        return time.time() > self.timestamp + self.ttl

server.py:209-209 (registration)
Registration of the index_cortex_docs tool using the FastMCP decorator.
```
@mcp.tool()
```

Tool Definition Quality

A3.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions indexing but doesn't disclose behavioral traits such as what 'indexing' entails (e.g., fetching, parsing, storing), whether it's a one-time or recurring operation, potential rate limits, or error handling. The description is too vague to inform the agent adequately about the tool's behavior beyond the basic action.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two short sentences that are front-loaded and waste no words. Every part of the description serves a clear purpose: stating the action and providing usage guidance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that there is an output schema (which handles return values), the description's minimalism is partially acceptable. However, for a tool with no annotations and a parameter that lacks description coverage, the description should provide more context about what indexing involves and the implications of the 'max_pages' parameter to be fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 1 parameter with 0% description coverage, and the tool description provides no information about parameters. However, since there is only one parameter and it has a default value, the baseline is adjusted to 3, as the schema alone provides some structure without additional semantic context from the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Index') and resource ('Cortex Cloud documentation'), making the purpose understandable. However, it doesn't differentiate this tool from its sibling 'index_cortex_api_docs', which appears to serve a similar indexing function for API documentation specifically.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states 'Call this first before searching,' providing clear guidance on when to use this tool in relation to other tools (like search_cortex_docs). This directly addresses the context of sibling tools and establishes a prerequisite order.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/clarkemn/cortex-cloud-docs-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server