index_prisma_docs
Prepare Prisma Cloud documentation for search queries by indexing content. Use this tool to enable efficient and accurate retrieval of documentation details before initiating searches.
Instructions
Index Prisma Cloud documentation. Call this first before searching.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| max_pages | No |
Input Schema (JSON Schema)
{
"properties": {
"max_pages": {
"default": 50,
"title": "Max Pages",
"type": "integer"
}
},
"title": "index_prisma_docsArguments",
"type": "object"
}
Implementation Reference
- src/main.py:214-217 (handler)The handler function for the 'index_prisma_docs' MCP tool. It invokes the DocumentationIndexer to crawl and cache up to max_pages from Prisma Cloud docs.async def index_prisma_docs(max_pages: int = 50) -> str: """Index Prisma Cloud documentation. Call this first before searching.""" pages_indexed = await indexer.index_site('prisma_cloud', max_pages) return f"Indexed {pages_indexed} pages from Prisma Cloud documentation"
- src/main.py:213-213 (registration)The @mcp.tool() decorator registers the index_prisma_docs function with the FastMCP server.@mcp.tool()
- src/main.py:39-106 (helper)The core helper method in DocumentationIndexer that performs BFS web crawling, extracts content using BeautifulSoup, and caches pages for the prisma_cloud site.async def index_site(self, site_name: str, max_pages: int = 100): """Index documentation from a specific site""" if site_name not in self.base_urls: raise ValueError(f"Unknown site: {site_name}") base_url = self.base_urls[site_name] visited_urls = set() urls_to_visit = [base_url] pages_indexed = 0 async with aiohttp.ClientSession() as session: while urls_to_visit and pages_indexed < max_pages: url = urls_to_visit.pop(0) if url in visited_urls: continue visited_urls.add(url) try: async with session.get(url, timeout=10) as response: if response.status == 200: content = await response.text() soup = BeautifulSoup(content, 'html.parser') # Extract page content title = soup.find('title') title_text = title.text.strip() if title else url # Remove script and style elements for script in soup(["script", "style"]): script.decompose() # Get text content text_content = soup.get_text() lines = (line.strip() for line in text_content.splitlines()) chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) text = ' '.join(chunk for chunk in chunks if chunk) # Store in cache self.cached_pages[url] = CachedPage( title=title_text, content=text[:5000], # Limit content length url=url, site=site_name, timestamp=time.time() ) pages_indexed += 1 # Find more links to index if pages_indexed < max_pages: links = soup.find_all('a', href=True) for link in links: href = link['href'] full_url = urljoin(url, href) # Only index URLs from the same domain if urlparse(full_url).netloc == urlparse(base_url).netloc: if full_url not in visited_urls and full_url not in urls_to_visit_set: urls_to_visit.append(full_url) urls_to_visit_set.add(full_url) except Exception as e: print(f"Error indexing {url}: {e}") continue return pages_indexed
- src/main.py:29-38 (helper)The DocumentationIndexer class providing caching and indexing utilities, including base URL for prisma_cloud.class DocumentationIndexer: def __init__(self): self.cached_pages = {} # URL -> CachedPage self.search_cache = {} # query -> (results, timestamp) self.base_urls = { 'prisma_cloud': 'https://docs.prismacloud.io/', 'prisma_api': 'https://pan.dev/prisma-cloud/api/', } self.search_cache_ttl = 300 # 5 minutes for search results