index_prisma_api_docs
Index Prisma Cloud API documentation to enable accurate search functionality. Call this tool first to prepare the documentation for querying.
Instructions
Index Prisma Cloud API documentation. Call this first before searching.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| max_pages | No |
Input Schema (JSON Schema)
{
"properties": {
"max_pages": {
"default": 50,
"title": "Max Pages",
"type": "integer"
}
},
"title": "index_prisma_api_docsArguments",
"type": "object"
}
Implementation Reference
- src/main.py:219-223 (handler)Primary handler for the index_prisma_api_docs tool. Decorated with @mcp.tool() for MCP registration and executes by calling the indexer's index_site method with 'prisma_api' site.@mcp.tool() async def index_prisma_api_docs(max_pages: int = 50) -> str: """Index Prisma Cloud API documentation. Call this first before searching.""" pages_indexed = await indexer.index_site('prisma_api', max_pages) return f"Indexed {pages_indexed} pages from Prisma Cloud API documentation"
- server.py:215-219 (handler)Duplicate handler implementation in server.py entrypoint, identical to src/main.py.@mcp.tool() async def index_prisma_api_docs(max_pages: int = 50) -> str: """Index Prisma Cloud API documentation. Call this first before searching.""" pages_indexed = await indexer.index_site('prisma_api', max_pages) return f"Indexed {pages_indexed} pages from Prisma Cloud API documentation"
- src/main.py:39-106 (helper)Core helper method in DocumentationIndexer class that performs web crawling, parses HTML, extracts content, caches pages, and follows links within the same domain for the specified site (e.g., 'prisma_api'). This is the main logic executed by the tool handler.async def index_site(self, site_name: str, max_pages: int = 100): """Index documentation from a specific site""" if site_name not in self.base_urls: raise ValueError(f"Unknown site: {site_name}") base_url = self.base_urls[site_name] visited_urls = set() urls_to_visit = [base_url] pages_indexed = 0 async with aiohttp.ClientSession() as session: while urls_to_visit and pages_indexed < max_pages: url = urls_to_visit.pop(0) if url in visited_urls: continue visited_urls.add(url) try: async with session.get(url, timeout=10) as response: if response.status == 200: content = await response.text() soup = BeautifulSoup(content, 'html.parser') # Extract page content title = soup.find('title') title_text = title.text.strip() if title else url # Remove script and style elements for script in soup(["script", "style"]): script.decompose() # Get text content text_content = soup.get_text() lines = (line.strip() for line in text_content.splitlines()) chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) text = ' '.join(chunk for chunk in chunks if chunk) # Store in cache self.cached_pages[url] = CachedPage( title=title_text, content=text[:5000], # Limit content length url=url, site=site_name, timestamp=time.time() ) pages_indexed += 1 # Find more links to index if pages_indexed < max_pages: links = soup.find_all('a', href=True) for link in links: href = link['href'] full_url = urljoin(url, href) # Only index URLs from the same domain if urlparse(full_url).netloc == urlparse(base_url).netloc: if full_url not in visited_urls and full_url not in urls_to_visit_set: urls_to_visit.append(full_url) urls_to_visit_set.add(full_url) except Exception as e: print(f"Error indexing {url}: {e}") continue return pages_indexed
- src/main.py:16-27 (helper)Dataclass used to store cached page data with expiration logic, utilized by the indexer.@dataclass class CachedPage: title: str content: str url: str site: str timestamp: float ttl: float = 3600 # 1 hour default TTL @property def is_expired(self) -> bool: return time.time() > self.timestamp + self.ttl