Skip to main content
Glama
ryu1maniwa

OpenTelemetry Documentation MCP Server

by ryu1maniwa

read_documentation

Fetch OpenTelemetry documentation pages and convert them to markdown format for easy reading and reference.

Instructions

Fetch and convert a OpenTelemetry documentation page to markdown format.

Usage

This tool retrieves the content of a OpenTelemetry documentation page and converts it to markdown format. For long documents, you can make multiple calls with different start_index values to retrieve the entire content in chunks.

URL Requirements

  • Must be from the opentelemetry.io domain

  • Must be a documentation page

Example URLs

  • https://opentelemetry.io/docs/concepts/observability-primer/

  • https://opentelemetry.io/docs/instrumentation/

  • https://opentelemetry.io/docs/collector/

Output Format

The output is formatted as markdown text with:

  • Preserved headings and structure

  • Code blocks for examples

  • Lists and tables converted to markdown format

Handling Long Documents

If the response indicates the document was truncated, you have several options:

  1. Continue Reading: Make another call with start_index set to the end of the previous response

  2. Stop Early: For very long documents (>30,000 characters), if you've already found the specific information needed, you can stop reading

Args: ctx: MCP context for logging and error handling url: URL of the OpenTelemetry documentation page to read max_length: Maximum number of characters to return start_index: On return output starting at this character index

Returns: Markdown content of the OpenTelemetry documentation

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
urlYesURL of the OpenTelemetry documentation page to read
max_lengthNoMaximum number of characters to return.
start_indexNoOn return output starting at this character index, useful if a previous fetch was truncated and more content is required.

Implementation Reference

  • The primary handler function for the 'read_documentation' tool. Decorated with @mcp.tool() for registration. Validates URL domain, fetches page with httpx, determines if HTML, extracts content to markdown, formats with pagination support (start_index, max_length), and returns truncated markdown with continuation prompt if needed.
    async def read_documentation(
        ctx: Context,
        url: Union[AnyUrl, str] = Field(description='URL of the OpenTelemetry documentation page to read'),
        max_length: int = Field(
            default=5000,
            description='Maximum number of characters to return.',
            gt=0,
            lt=1000000,
        ),
        start_index: int = Field(
            default=0,
            description='On return output starting at this character index, useful if a previous fetch was truncated and more content is required.',
            ge=0,
        ),
    ) -> str:
        """Fetch and convert a OpenTelemetry documentation page to markdown format.
    
        ## Usage
    
        This tool retrieves the content of a OpenTelemetry documentation page and converts it to markdown format.
        For long documents, you can make multiple calls with different start_index values to retrieve
        the entire content in chunks.
    
        ## URL Requirements
    
        - Must be from the opentelemetry.io domain
        - Must be a documentation page
    
        ## Example URLs
    
        - https://opentelemetry.io/docs/concepts/observability-primer/
        - https://opentelemetry.io/docs/instrumentation/
        - https://opentelemetry.io/docs/collector/
    
        ## Output Format
    
        The output is formatted as markdown text with:
        - Preserved headings and structure
        - Code blocks for examples
        - Lists and tables converted to markdown format
    
        ## Handling Long Documents
    
        If the response indicates the document was truncated, you have several options:
    
        1. **Continue Reading**: Make another call with start_index set to the end of the previous response
        2. **Stop Early**: For very long documents (>30,000 characters), if you've already found the specific information needed, you can stop reading
    
        Args:
            ctx: MCP context for logging and error handling
            url: URL of the OpenTelemetry documentation page to read
            max_length: Maximum number of characters to return
            start_index: On return output starting at this character index
    
        Returns:
            Markdown content of the OpenTelemetry documentation
        """
        # Validate that URL is from opentelemetry.io
        url_str = str(url)
        if not re.match(r'^https?://opentelemetry\.io/', url_str):
            await ctx.error(f'Invalid URL: {url_str}. URL must be from the opentelemetry.io domain')
            raise ValueError('URL must be from the opentelemetry.io domain')
    
        logger.debug(f'Fetching documentation from {url_str}')
    
        async with httpx.AsyncClient() as client:
            try:
                response = await client.get(
                    url_str,
                    follow_redirects=True,
                    headers={'User-Agent': DEFAULT_USER_AGENT},
                    timeout=30,
                )
            except httpx.HTTPError as e:
                error_msg = f'Failed to fetch {url_str}: {str(e)}'
                logger.error(error_msg)
                await ctx.error(error_msg)
                return error_msg
    
            if response.status_code >= 400:
                error_msg = f'Failed to fetch {url_str} - status code {response.status_code}'
                logger.error(error_msg)
                await ctx.error(error_msg)
                return error_msg
    
            page_raw = response.text
            content_type = response.headers.get('content-type', '')
    
        if is_html_content(page_raw, content_type):
            content = extract_content_from_html(page_raw)
        else:
            content = page_raw
    
        result = format_documentation_result(url_str, content, start_index, max_length)
    
        # Log if content was truncated
        if len(content) > start_index + max_length:
            logger.debug(
                f'Content truncated at {start_index + max_length} of {len(content)} characters'
            )
    
        return result
  • Helper function called by the handler to slice the content based on start_index and max_length, add header with URL, and append truncation notice prompting for next pagination call.
    def format_documentation_result(url: str, content: str, start_index: int, max_length: int) -> str:
        """Format documentation result with pagination information.
    
        Args:
            url: Documentation URL
            content: Content to format
            start_index: Start index for pagination
            max_length: Maximum content length
    
        Returns:
            Formatted documentation result
        """
        original_length = len(content)
    
        if start_index >= original_length:
            return f'OpenTelemetry Documentation from {url}:\n\n<e>No more content available.</e>'
    
        # Calculate the end index, ensuring we don't go beyond the content length
        end_index = min(start_index + max_length, original_length)
        truncated_content = content[start_index:end_index]
    
        if not truncated_content:
            return f'OpenTelemetry Documentation from {url}:\n\n<e>No more content available.</e>'
    
        actual_content_length = len(truncated_content)
        remaining_content = original_length - (start_index + actual_content_length)
    
        result = f'OpenTelemetry Documentation from {url}:\n\n{truncated_content}'
    
        # Only add the prompt to continue fetching if there is still remaining content
        if remaining_content > 0:
            next_start = start_index + actual_content_length
            result += f'\n\n<e>Content truncated. Call the read_documentation tool with start_index={next_start} to get more content.</e>'
    
        return result
  • Helper function extracts the main documentation content from OpenTelemetry HTML pages using targeted selectors for opentelemetry.io structure, cleans unwanted elements, and converts to clean markdown format.
    def extract_content_from_html(html: str) -> str:
        """Extract and convert HTML content to Markdown format.
    
        Args:
            html: Raw HTML content to process
    
        Returns:
            Simplified markdown version of the content
        """
        if not html:
            return '<e>Empty HTML content</e>'
    
        try:
            # Parse HTML with BeautifulSoup
            soup = BeautifulSoup(html, 'html.parser')
    
            # Try to find the main content area
            main_content = None
    
            # Common content container selectors for OpenTelemetry documentation
            content_selectors = [
                '.td-content',  # opentelemetry.io uses this selector for main content
                'main',
                'article',
                '#content',
                '.content',
                '#body-content',
                "div[role='main']",
                '.td-main',
            ]
    
            # Try to find the main content using common selectors
            for selector in content_selectors:
                content = soup.select_one(selector)
                if content:
                    main_content = content
                    break
    
            # If no main content found, use the body
            if not main_content:
                main_content = soup.body if soup.body else soup
    
            # Remove navigation elements that might be in the main content
            nav_selectors = [
                'noscript',
                '.prevNext',
                '.docsite-footer',
                '.feedback',
                '.td-sidebar',
                '.td-sidebar-nav',
                '.td-page-meta',
                '.td-search',
            ]
    
            for selector in nav_selectors:
                for element in main_content.select(selector):
                    element.decompose()
    
            # Define tags to strip - these are elements we don't want in the output
            tags_to_strip = [
                'script',
                'style',
                'noscript',
                'meta',
                'link',
                'footer',
                'nav',
                'aside',
                'header',
                '.td-sidebar',
                '.td-sidebar-nav',
                '.td-page-meta',
                '.td-search',
                # Common unnecessary elements
                'js-show-more-buttons',
                'js-show-more-text',
                'feedback-container',
                'feedback-section',
                'doc-feedback-container',
                'doc-feedback-section',
                'warning-container',
                'warning-section',
                'cookie-banner',
                'cookie-notice',
                'copyright-section',
                'legal-section',
                'terms-section',
            ]
    
            # Use markdownify on the cleaned HTML content
            content = markdownify.markdownify(
                str(main_content),
                heading_style='ATX',
                autolinks=True,
                default_title=True,
                escape_asterisks=True,
                escape_underscores=True,
                newline_style='SPACES',
                strip=tags_to_strip,
            )
    
            if not content:
                return '<e>Page failed to be simplified from HTML</e>'
    
            return content
        except Exception as e:
            return f'<e>Error converting HTML to Markdown: {str(e)}</e>'
  • Helper function determines whether the fetched page content is HTML based on content and headers, used to decide extraction path.
    def is_html_content(page_raw: str, content_type: str) -> bool:
        """Determine if content is HTML.
    
        Args:
            page_raw: Raw page content
            content_type: Content-Type header
    
        Returns:
            True if content is HTML, False otherwise
        """
        return '<html' in page_raw[:100] or 'text/html' in content_type or not content_type
  • The @mcp.tool() decorator registers the read_documentation function with the FastMCP server instance.
    async def read_documentation(
Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ryu1maniwa/opentelemetry-documentation-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server