read_documentation
Convert OpenTelemetry documentation pages to markdown format. Fetch content in chunks for long documents, preserving headings, code blocks, lists, and tables.
Instructions
Fetch and convert a OpenTelemetry documentation page to markdown format.
Usage
This tool retrieves the content of a OpenTelemetry documentation page and converts it to markdown format. For long documents, you can make multiple calls with different start_index values to retrieve the entire content in chunks.
URL Requirements
Must be from the opentelemetry.io domain
Must be a documentation page
Example URLs
https://opentelemetry.io/docs/concepts/observability-primer/
https://opentelemetry.io/docs/instrumentation/
https://opentelemetry.io/docs/collector/
Output Format
The output is formatted as markdown text with:
Preserved headings and structure
Code blocks for examples
Lists and tables converted to markdown format
Handling Long Documents
If the response indicates the document was truncated, you have several options:
Continue Reading: Make another call with start_index set to the end of the previous response
Stop Early: For very long documents (>30,000 characters), if you've already found the specific information needed, you can stop reading
Args: ctx: MCP context for logging and error handling url: URL of the OpenTelemetry documentation page to read max_length: Maximum number of characters to return start_index: On return output starting at this character index
Returns: Markdown content of the OpenTelemetry documentation
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| max_length | No | Maximum number of characters to return. | |
| start_index | No | On return output starting at this character index, useful if a previous fetch was truncated and more content is required. | |
| url | Yes | URL of the OpenTelemetry documentation page to read |
Implementation Reference
- The main handler function for 'read_documentation' tool. It validates the URL, fetches the OpenTelemetry documentation page using httpx, extracts and converts HTML to markdown if applicable, formats the result with pagination support, and returns the content as a string.@mcp.tool() async def read_documentation( ctx: Context, url: Union[AnyUrl, str] = Field(description='URL of the OpenTelemetry documentation page to read'), max_length: int = Field( default=5000, description='Maximum number of characters to return.', gt=0, lt=1000000, ), start_index: int = Field( default=0, description='On return output starting at this character index, useful if a previous fetch was truncated and more content is required.', ge=0, ), ) -> str: """Fetch and convert a OpenTelemetry documentation page to markdown format. ## Usage This tool retrieves the content of a OpenTelemetry documentation page and converts it to markdown format. For long documents, you can make multiple calls with different start_index values to retrieve the entire content in chunks. ## URL Requirements - Must be from the opentelemetry.io domain - Must be a documentation page ## Example URLs - https://opentelemetry.io/docs/concepts/observability-primer/ - https://opentelemetry.io/docs/instrumentation/ - https://opentelemetry.io/docs/collector/ ## Output Format The output is formatted as markdown text with: - Preserved headings and structure - Code blocks for examples - Lists and tables converted to markdown format ## Handling Long Documents If the response indicates the document was truncated, you have several options: 1. **Continue Reading**: Make another call with start_index set to the end of the previous response 2. **Stop Early**: For very long documents (>30,000 characters), if you've already found the specific information needed, you can stop reading Args: ctx: MCP context for logging and error handling url: URL of the OpenTelemetry documentation page to read max_length: Maximum number of characters to return start_index: On return output starting at this character index Returns: Markdown content of the OpenTelemetry documentation """ # Validate that URL is from opentelemetry.io url_str = str(url) if not re.match(r'^https?://opentelemetry\.io/', url_str): await ctx.error(f'Invalid URL: {url_str}. URL must be from the opentelemetry.io domain') raise ValueError('URL must be from the opentelemetry.io domain') logger.debug(f'Fetching documentation from {url_str}') async with httpx.AsyncClient() as client: try: response = await client.get( url_str, follow_redirects=True, headers={'User-Agent': DEFAULT_USER_AGENT}, timeout=30, ) except httpx.HTTPError as e: error_msg = f'Failed to fetch {url_str}: {str(e)}' logger.error(error_msg) await ctx.error(error_msg) return error_msg if response.status_code >= 400: error_msg = f'Failed to fetch {url_str} - status code {response.status_code}' logger.error(error_msg) await ctx.error(error_msg) return error_msg page_raw = response.text content_type = response.headers.get('content-type', '') if is_html_content(page_raw, content_type): content = extract_content_from_html(page_raw) else: content = page_raw result = format_documentation_result(url_str, content, start_index, max_length) # Log if content was truncated if len(content) > start_index + max_length: logger.debug( f'Content truncated at {start_index + max_length} of {len(content)} characters' ) return result
- Helper function used by read_documentation to slice the content based on start_index and max_length, format it with URL prefix, and append truncation message if more content available.def format_documentation_result(url: str, content: str, start_index: int, max_length: int) -> str: """Format documentation result with pagination information. Args: url: Documentation URL content: Content to format start_index: Start index for pagination max_length: Maximum content length Returns: Formatted documentation result """ original_length = len(content) if start_index >= original_length: return f'OpenTelemetry Documentation from {url}:\n\n<e>No more content available.</e>' # Calculate the end index, ensuring we don't go beyond the content length end_index = min(start_index + max_length, original_length) truncated_content = content[start_index:end_index] if not truncated_content: return f'OpenTelemetry Documentation from {url}:\n\n<e>No more content available.</e>' actual_content_length = len(truncated_content) remaining_content = original_length - (start_index + actual_content_length) result = f'OpenTelemetry Documentation from {url}:\n\n{truncated_content}' # Only add the prompt to continue fetching if there is still remaining content if remaining_content > 0: next_start = start_index + actual_content_length result += f'\n\n<e>Content truncated. Call the read_documentation tool with start_index={next_start} to get more content.</e>' return result
- Helper function called by read_documentation to parse HTML content using BeautifulSoup, select main content area specific to OpenTelemetry docs, remove navigation/UI elements, and convert to markdown using markdownify library.def extract_content_from_html(html: str) -> str: """Extract and convert HTML content to Markdown format. Args: html: Raw HTML content to process Returns: Simplified markdown version of the content """ if not html: return '<e>Empty HTML content</e>' try: # Parse HTML with BeautifulSoup soup = BeautifulSoup(html, 'html.parser') # Try to find the main content area main_content = None # Common content container selectors for OpenTelemetry documentation content_selectors = [ '.td-content', # opentelemetry.io uses this selector for main content 'main', 'article', '#content', '.content', '#body-content', "div[role='main']", '.td-main', ] # Try to find the main content using common selectors for selector in content_selectors: content = soup.select_one(selector) if content: main_content = content break # If no main content found, use the body if not main_content: main_content = soup.body if soup.body else soup # Remove navigation elements that might be in the main content nav_selectors = [ 'noscript', '.prevNext', '.docsite-footer', '.feedback', '.td-sidebar', '.td-sidebar-nav', '.td-page-meta', '.td-search', ] for selector in nav_selectors: for element in main_content.select(selector): element.decompose() # Define tags to strip - these are elements we don't want in the output tags_to_strip = [ 'script', 'style', 'noscript', 'meta', 'link', 'footer', 'nav', 'aside', 'header', '.td-sidebar', '.td-sidebar-nav', '.td-page-meta', '.td-search', # Common unnecessary elements 'js-show-more-buttons', 'js-show-more-text', 'feedback-container', 'feedback-section', 'doc-feedback-container', 'doc-feedback-section', 'warning-container', 'warning-section', 'cookie-banner', 'cookie-notice', 'copyright-section', 'legal-section', 'terms-section', ] # Use markdownify on the cleaned HTML content content = markdownify.markdownify( str(main_content), heading_style='ATX', autolinks=True, default_title=True, escape_asterisks=True, escape_underscores=True, newline_style='SPACES', strip=tags_to_strip, ) if not content: return '<e>Page failed to be simplified from HTML</e>' return content except Exception as e: return f'<e>Error converting HTML to Markdown: {str(e)}</e>'
- Helper function used by read_documentation to check if fetched content is HTML based on content and headers, to decide whether to extract or use raw.def is_html_content(page_raw: str, content_type: str) -> bool: """Determine if content is HTML. Args: page_raw: Raw page content content_type: Content-Type header Returns: True if content is HTML, False otherwise """ return '<html' in page_raw[:100] or 'text/html' in content_type or not content_type
- opentelemetry_documentation_mcp_server/server.py:82-82 (registration)The @mcp.tool() decorator registers the read_documentation function as an MCP tool with schema inferred from parameters.@mcp.tool()