Skip to main content
Glama
dhevenb

Spec3 MCP Server

by dhevenb

get_document

Retrieve Spec3 racing documents with full text and visual content like diagrams and tables from PDFs stored in S3. Specify page ranges and include images to preserve formatting.

Instructions

Retrieve full text and visual content of Spec3 racing reference documents.

Fetches complete PDF content from S3 including text and page images. Page images preserve diagrams, tables, and formatting that text extraction cannot capture.

Args: document_id: Document ID from list_documents (e.g., "spec3_rules") page_start: Starting page number (default: 1) page_end: Ending page number (default: None for all remaining pages) include_images: Include page images for diagrams/tables (default: True)

Returns: dict: Document text, page images (base64), metadata, and page range

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
document_idYes
page_startNo
page_endNo
include_imagesNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The @mcp.tool()-decorated async handler function implementing the 'get_document' tool. Accepts document_id, optional page range, and image flag. Downloads PDF from S3, extracts text using pypdf, optionally generates base64 PNG images using pdf2image, and returns structured result with text, images, and metadata.
    @mcp.tool()
    async def get_document(
        document_id: str,
        page_start: int = 1,
        page_end: int | None = None,
        include_images: bool = True
    ) -> dict[str, Any]:
        """
        Retrieve full text and visual content of Spec3 racing reference documents.
    
        Fetches complete PDF content from S3 including text and page images.
        Page images preserve diagrams, tables, and formatting that text extraction
        cannot capture.
    
        Args:
            document_id: Document ID from list_documents (e.g., "spec3_rules")
            page_start: Starting page number (default: 1)
            page_end: Ending page number (default: None for all remaining pages)
            include_images: Include page images for diagrams/tables (default: True)
    
        Returns:
            dict: Document text, page images (base64), metadata, and page range
        """
        logger.info(f"get_document called for: {document_id}, pages {page_start}-{page_end}, images={include_images}")
    
        if document_id not in AVAILABLE_DOCS:
            return {
                "error": f"Document ID '{document_id}' not found. Use list_documents to see available documents.",
                "available_ids": list(AVAILABLE_DOCS.keys())
            }
    
        try:
            doc_info = AVAILABLE_DOCS[document_id]
            s3_key = doc_info["s3_key"]
    
            # Download PDF from S3
            logger.info(f"Downloading {s3_key} from S3")
            response = s3_client.get_object(Bucket=S3_BUCKET, Key=s3_key)
            pdf_content = response['Body'].read()
    
            # Parse PDF for text
            pdf_file = BytesIO(pdf_content)
            pdf_reader = pypdf.PdfReader(pdf_file)
    
            total_pages = len(pdf_reader.pages)
    
            # Validate and adjust page range
            page_start = max(1, page_start)
            if page_end is None:
                page_end = total_pages
            else:
                page_end = min(page_end, total_pages)
    
            if page_start > total_pages:
                return {
                    "error": f"page_start ({page_start}) exceeds total pages ({total_pages})",
                    "total_pages": total_pages
                }
    
            # Extract text from specified pages
            text_content = []
            for page_num in range(page_start - 1, page_end):
                page = pdf_reader.pages[page_num]
                page_text = page.extract_text()
                text_content.append(f"--- Page {page_num + 1} ---\n{page_text}")
    
            full_text = "\n\n".join(text_content)
    
            # Extract page images if requested
            page_images = []
            if include_images:
                logger.info(f"Converting pages {page_start}-{page_end} to images")
    
                # Convert PDF pages to images
                images = convert_from_bytes(
                    pdf_content,
                    first_page=page_start,
                    last_page=page_end,
                    dpi=150  # Balance between quality and size
                )
    
                for idx, img in enumerate(images):
                    # Convert to base64
                    buffered = BytesIO()
                    img.save(buffered, format="PNG", optimize=True)
                    img_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8')
    
                    page_images.append({
                        "page_number": page_start + idx,
                        "image": img_base64,
                        "format": "png"
                    })
    
            result = {
                "document_name": doc_info["name"],
                "document_id": document_id,
                "total_pages": total_pages,
                "pages_retrieved": f"{page_start}-{page_end}",
                "text": full_text,
                "images": page_images,
                "num_images": len(page_images),
                "size_bytes": len(pdf_content)
            }
    
            logger.info(f"Successfully retrieved {page_end - page_start + 1} pages ({len(page_images)} images) from {doc_info['name']}")
            return result
    
        except Exception as e:
            logger.error(f"Error retrieving document: {str(e)}")
            return {
                "error": f"Error retrieving document: {str(e)}",
                "document_id": document_id
            }
  • Docstring defining the tool's input parameters, their types/defaults, and return format, serving as the schema for the tool.
    """
    Retrieve full text and visual content of Spec3 racing reference documents.
    
    Fetches complete PDF content from S3 including text and page images.
    Page images preserve diagrams, tables, and formatting that text extraction
    cannot capture.
    
    Args:
        document_id: Document ID from list_documents (e.g., "spec3_rules")
        page_start: Starting page number (default: 1)
        page_end: Ending page number (default: None for all remaining pages)
        include_images: Include page images for diagrams/tables (default: True)
    
    Returns:
        dict: Document text, page images (base64), metadata, and page range
    """
  • AVAILABLE_DOCS mapping from document_id to S3 key and metadata, used to validate document_id and fetch the correct PDF.
    AVAILABLE_DOCS = {
        "spec3_constructor_guide": {
            "name": "Spec3 E36 Race Car Constructor's Guide",
            "s3_key": "Spec3 E36 Race Car Contsructor's Guide.pdf",
            "description": "Comprehensive guide for building a Spec3 E36 race car"
        },
        "bentley_manual_general": {
            "name": "Bentley General Manual",
            "s3_key": "bentley_general.pdf",
            "description": "Bentley BMW E36 Manual - GENERAL SECTION"
        },
        "nasa_ccr": {
            "name": "2025 NASA Competition Comp Rules (CCR)",
            "s3_key": "2025.4_NASACCR.pdf",
            "description": "2025 NASA Club Championship Racing rules"
        },
        "spec3_rules": {
            "name": "2025 Spec3 Rules",
            "s3_key": "2025_Spec3_Rules.pdf",
            "description": "2025 Spec3 racing class specific rules and regulations"
        }
    }
  • @mcp.tool() decorator registers the get_document function as an MCP tool in the FastMCP server.
    @mcp.tool()
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does so well by disclosing key behavioral traits: it fetches from S3, preserves diagrams/tables via page images that text extraction cannot capture, includes default values for parameters, and describes the return structure. It does not mention rate limits or auth needs, but covers essential operational details adequately.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, starting with the core purpose, followed by key details, and ending with return info. Every sentence adds value (e.g., explaining S3 source, image preservation, parameter semantics, and output structure) with zero waste, making it efficient and well-structured for an agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (4 parameters, 0% schema coverage, no annotations, but has output schema), the description is complete enough. It covers purpose, usage, parameters, and output details, compensating for the lack of schema descriptions and annotations. The output schema exists, so the description need not explain return values in depth, and it still provides a high-level overview of the return dict.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate fully. It adds significant meaning beyond the bare schema by explaining each parameter's purpose (e.g., 'document_id: Document ID from list_documents'), providing examples ('e.g., "spec3_rules"'), and clarifying defaults and effects ('include_images: Include page images for diagrams/tables'). This effectively documents all 4 parameters where the schema lacks descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Retrieve full text and visual content'), identifies the resource ('Spec3 racing reference documents'), and distinguishes it from siblings by specifying it fetches PDF content from S3, unlike 'get_car_context' or 'list_documents'. It explicitly mentions what the tool does beyond just listing or providing context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (to fetch complete PDF content with text and images) and implies usage by referencing 'document_id: Document ID from list_documents', suggesting it follows a list operation. However, it does not explicitly state when not to use it or name alternatives like 'list_documents' for just listing, leaving some guidance implicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/dhevenb/dheven-spec3-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server