Skip to main content
Glama

search_oceanbase_document

Extracts OceanBase documentation context using keywords from user queries, enabling accurate LLM responses by retrieving and integrating relevant information dynamically.

Instructions

This tool is designed to provide context-specific information about OceanBase to a large language model (LLM) to enhance the accuracy and relevance of its responses. The LLM should automatically extracts relevant search keywords from user queries or LLM's answer for the tool parameter "keyword". The main functions of this tool include: 1.Information Retrieval: The MCP Tool searches through OceanBase-related documentation using the extracted keywords, locating and extracting the most relevant information. 2.Context Provision: The retrieved information from OceanBase documentation is then fed back to the LLM as contextual reference material. This context is not directly shown to the user but is used to refine and inform the LLM’s responses. This tool ensures that when the LLM’s internal documentation is insufficient to generate high-quality responses, it dynamically retrieves necessary OceanBase information, thereby maintaining a high level of response accuracy and expertise.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
keywordYes

Implementation Reference

  • The primary handler function for the 'search_oceanbase_document' tool. It performs an API search on OceanBase documentation using the provided keyword, fetches details for the top 5 results via the helper function, and returns a JSON list of document contents.
    @app.tool() def search_oceanbase_document(keyword: str) -> str: """ This tool is designed to provide context-specific information about OceanBase to a large language model (LLM) to enhance the accuracy and relevance of its responses. The LLM should automatically extracts relevant search keywords from user queries or LLM's answer for the tool parameter "keyword". The main functions of this tool include: 1.Information Retrieval: The MCP Tool searches through OceanBase-related documentation using the extracted keywords, locating and extracting the most relevant information. 2.Context Provision: The retrieved information from OceanBase documentation is then fed back to the LLM as contextual reference material. This context is not directly shown to the user but is used to refine and inform the LLM’s responses. This tool ensures that when the LLM’s internal documentation is insufficient to generate high-quality responses, it dynamically retrieves necessary OceanBase information, thereby maintaining a high level of response accuracy and expertise. Important: keyword must be Chinese """ logger.info(f"Calling tool: search_oceanbase_document,keyword:{keyword}") search_api_url = ( "https://cn-wan-api.oceanbase.com/wanApi/forum/docCenter/productDocFile/v3/searchDocList" ) headers = { "Content-Type": "application/json", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", "Accept": "application/json", "Origin": "https://www.oceanbase.com", "Referer": "https://www.oceanbase.com/", } qeury_param = { "pageNo": 1, "pageSize": 5, # Search for 5 results at a time. "query": keyword, } # Turn the dictionary into a JSON string, then change it to bytes qeury_param = json.dumps(qeury_param).encode("utf-8") req = request.Request(search_api_url, data=qeury_param, headers=headers, method="POST") # Create an SSL context using certifi to fix HTTPS errors. context = ssl.create_default_context(cafile=certifi.where()) try: with request.urlopen(req, timeout=5, context=context) as response: response_body = response.read().decode("utf-8") json_data = json.loads(response_body) # In the results, we mainly need the content in the data field. data_array = json_data["data"] # Parse JSON response result_list = [] for item in data_array: doc_url = "https://www.oceanbase.com/docs/" + item["urlCode"] + "-" + item["id"] logger.info(f"doc_url:${doc_url}") content = get_ob_doc_content(doc_url, item["id"]) result_list.append(content) return json.dumps(result_list, ensure_ascii=False) except error.HTTPError as e: logger.error(f"HTTP Error: {e.code} - {e.reason}") return "No results were found" except error.URLError as e: logger.error(f"URL Error: {e.reason}") return "No results were found"
  • Supporting helper function called by the handler to retrieve and parse detailed content from individual OceanBase documentation pages, extracting cleaned text and metadata.
    def get_ob_doc_content(doc_url: str, doc_id: str) -> dict: doc_param = {"id": doc_id, "url": doc_url} doc_param = json.dumps(doc_param).encode("utf-8") headers = { "Content-Type": "application/json", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", "Accept": "application/json", "Origin": "https://www.oceanbase.com", "Referer": "https://www.oceanbase.com/", } doc_api_url = ( "https://cn-wan-api.oceanbase.com/wanApi/forum/docCenter/productDocFile/v4/docDetails" ) req = request.Request(doc_api_url, data=doc_param, headers=headers, method="POST") # Make an SSL context with certifi to fix HTTPS errors. context = ssl.create_default_context(cafile=certifi.where()) try: with request.urlopen(req, timeout=5, context=context) as response: response_body = response.read().decode("utf-8") json_data = json.loads(response_body) # In the results, we mainly need the content in the data field. data = json_data["data"] # The docContent field has HTML text. soup = BeautifulSoup(data["docContent"], "html.parser") # Remove script, style, nav, header, and footer elements. for element in soup(["script", "style", "nav", "header", "footer"]): element.decompose() # Remove HTML tags and keep only the text. text = soup.get_text() # Remove spaces at the beginning and end of each line. lines = (line.strip() for line in text.splitlines()) # Remove empty lines. text = "\n".join(line for line in lines if line) logger.info(f"text length:{len(text)}") # If the text is too long, only keep the first 8000 characters. if len(text) > 8000: text = text[:8000] + "... [content truncated]" # Reorganize the final result. The tdkInfo field should include the document's title, description, and keywords. tdkInfo = data["tdkInfo"] final_result = { "title": tdkInfo["title"], "description": tdkInfo["description"], "keyword": tdkInfo["keyword"], "content": text, "oceanbase_version": data["version"], "content_updatetime": data["docGmtModified"], } return final_result except error.HTTPError as e: logger.error(f"HTTP Error: {e.code} - {e.reason}") return {"result": "No results were found"} except error.URLError as e: logger.error(f"URL Error: {e.reason}") return {"result": "No results were found"}

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/oceanbase/mcp-oceanbase'

If you have feedback or need assistance with the MCP directory API, please join our Discord server