Skip to main content
Glama
clarkemn

cortex-cloud-docs-mcp-server

index_cortex_api_docs

Index Cortex Cloud API documentation to enable search functionality. Call this tool first before querying the documentation.

Instructions

Index Cortex Cloud API documentation. Call this first before searching.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
max_pagesNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The handler function for the 'index_cortex_api_docs' tool. It calls the DocumentationIndexer.index_site method with site='cortex_api' to crawl and cache API documentation pages.
    async def index_cortex_api_docs(max_pages: int = 50) -> str:
        """Index Cortex Cloud API documentation. Call this first before searching."""
        pages_indexed = await indexer.index_site('cortex_api', max_pages)
        return f"Indexed {pages_indexed} pages from Cortex Cloud API documentation"
  • server.py:215-215 (registration)
    Registers the 'index_cortex_api_docs' tool using the @mcp.tool() decorator.
    @mcp.tool()
  • The core helper method index_site in DocumentationIndexer class that performs web crawling, parsing, and caching of documentation pages using aiohttp and BeautifulSoup.
    async def index_site(self, site_name: str, max_pages: int = 100):
        """Index documentation from a specific site"""
        if site_name not in self.base_urls:
            raise ValueError(f"Unknown site: {site_name}")
        
        base_url = self.base_urls[site_name]
        visited_urls = set()
        urls_to_visit = [base_url]
        pages_indexed = 0
        
        async with aiohttp.ClientSession() as session:
            while urls_to_visit and pages_indexed < max_pages:
                url = urls_to_visit.pop(0)
                
                if url in visited_urls:
                    continue
                    
                visited_urls.add(url)
                
                try:
                    async with session.get(url, timeout=10) as response:
                        if response.status == 200:
                            content = await response.text()
                            soup = BeautifulSoup(content, 'html.parser')
                            
                            # Extract page content
                            title = soup.find('title')
                            title_text = title.text.strip() if title else url
                            
                            # Remove script and style elements
                            for script in soup(["script", "style"]):
                                script.decompose()
                            
                            # Get text content
                            text_content = soup.get_text()
                            lines = (line.strip() for line in text_content.splitlines())
                            chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
                            text = ' '.join(chunk for chunk in chunks if chunk)
                            
                            # Store in cache
                            self.cached_pages[url] = CachedPage(
                                title=title_text,
                                content=text[:5000],  # Limit content length
                                url=url,
                                site=site_name,
                                timestamp=time.time()
                            )
                            
                            pages_indexed += 1
                            
                            # Find more links to index
                            if pages_indexed < max_pages:
                                links = soup.find_all('a', href=True)
                                for link in links:
                                    href = link['href']
                                    full_url = urljoin(url, href)
                                    
                                    # Only index URLs from the same domain
                                    if urlparse(full_url).netloc == urlparse(base_url).netloc:
                                        if full_url not in visited_urls and full_url not in urls_to_visit:
                                            urls_to_visit.append(full_url)
                                
                except Exception as e:
                    print(f"Error indexing {url}: {e}")
                    continue
        
        return pages_indexed
  • Identical handler function for the 'index_cortex_api_docs' tool in the duplicate implementation.
    async def index_cortex_api_docs(max_pages: int = 50) -> str:
        """Index Cortex Cloud API documentation. Call this first before searching."""
        pages_indexed = await indexer.index_site('cortex_api', max_pages)
        return f"Indexed {pages_indexed} pages from Cortex Cloud API documentation"
  • Identical core indexing helper method in the duplicate file.
    async def index_site(self, site_name: str, max_pages: int = 100):
        """Index documentation from a specific site"""
        if site_name not in self.base_urls:
            raise ValueError(f"Unknown site: {site_name}")
        
        base_url = self.base_urls[site_name]
        visited_urls = set()
        urls_to_visit = [base_url]
        pages_indexed = 0
        
        async with aiohttp.ClientSession() as session:
            while urls_to_visit and pages_indexed < max_pages:
                url = urls_to_visit.pop(0)
                
                if url in visited_urls:
                    continue
                    
                visited_urls.add(url)
                
                try:
                    async with session.get(url, timeout=10) as response:
                        if response.status == 200:
                            content = await response.text()
                            soup = BeautifulSoup(content, 'html.parser')
                            
                            # Extract page content
                            title = soup.find('title')
                            title_text = title.text.strip() if title else url
                            
                            # Remove script and style elements
                            for script in soup(["script", "style"]):
                                script.decompose()
                            
                            # Get text content
                            text_content = soup.get_text()
                            lines = (line.strip() for line in text_content.splitlines())
                            chunks = (phrase.strip() for line in lines for phrase in line.split("  "))
                            text = ' '.join(chunk for chunk in chunks if chunk)
                            
                            # Store in cache
                            self.cached_pages[url] = CachedPage(
                                title=title_text,
                                content=text[:5000],  # Limit content length
                                url=url,
                                site=site_name,
                                timestamp=time.time()
                            )
                            
                            pages_indexed += 1
                            
                            # Find more links to index
                            if pages_indexed < max_pages:
                                links = soup.find_all('a', href=True)
                                for link in links:
                                    href = link['href']
                                    full_url = urljoin(url, href)
                                    
                                    # Only index URLs from the same domain
                                    if urlparse(full_url).netloc == urlparse(base_url).netloc:
                                        if full_url not in visited_urls and full_url not in urls_to_visit:
                                            urls_to_visit.append(full_url)
                                
                except Exception as e:
                    print(f"Error indexing {url}: {e}")
                    continue
        
        return pages_indexed
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions this is an indexing operation but doesn't disclose behavioral traits like whether it's idempotent, how long it takes, what happens if interrupted, or what authentication/rate limits apply. The description adds minimal context beyond the basic action.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (two short sentences) and front-loaded with the essential information. Every word earns its place, with no wasted text or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that there's an output schema (which should describe return values) and only one parameter, the description covers the essential 'what and when' adequately. However, for an indexing operation with no annotations, it should ideally mention more about the process (e.g., that it might take time, what 'indexing' entails).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It provides no information about the 'max_pages' parameter - not what it means, why it's needed, or typical values. The description doesn't mention parameters at all, leaving the single parameter completely undocumented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Index') and resource ('Cortex Cloud API documentation'), making the purpose understandable. However, it doesn't differentiate this tool from its sibling 'index_cortex_docs', which appears to be a similar indexing tool for different documentation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'Call this first before searching.' This clearly indicates when to use this tool (as a prerequisite step) and implies alternatives (the search tools listed as siblings).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/clarkemn/cortex-cloud-docs-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server