get_document

Retrieve Spec3 racing documents with full text and visual content like diagrams and tables from PDFs stored in S3. Specify page ranges and include images to preserve formatting.

Instructions

Retrieve full text and visual content of Spec3 racing reference documents.

Fetches complete PDF content from S3 including text and page images. Page images preserve diagrams, tables, and formatting that text extraction cannot capture.

Args: document_id: Document ID from list_documents (e.g., "spec3_rules") page_start: Starting page number (default: 1) page_end: Ending page number (default: None for all remaining pages) include_images: Include page images for diagrams/tables (default: True)

Returns: dict: Document text, page images (base64), metadata, and page range

Input Schema

TableJSON Schema

Name	Required	Description	Default
`document_id`	Yes
`page_start`	No
`page_end`	No
`include_images`	No

Implementation Reference

src/spec3_mcp_server/server.py:115-227 (handler)
The @mcp.tool()-decorated async handler function implementing the 'get_document' tool. Accepts document_id, optional page range, and image flag. Downloads PDF from S3, extracts text using pypdf, optionally generates base64 PNG images using pdf2image, and returns structured result with text, images, and metadata.
@mcp.tool() async def get_document( document_id: str, page_start: int = 1, page_end: int | None = None, include_images: bool = True ) -> dict[str, Any]: """ Retrieve full text and visual content of Spec3 racing reference documents. Fetches complete PDF content from S3 including text and page images. Page images preserve diagrams, tables, and formatting that text extraction cannot capture. Args: document_id: Document ID from list_documents (e.g., "spec3_rules") page_start: Starting page number (default: 1) page_end: Ending page number (default: None for all remaining pages) include_images: Include page images for diagrams/tables (default: True) Returns: dict: Document text, page images (base64), metadata, and page range """ logger.info(f"get_document called for: {document_id}, pages {page_start}-{page_end}, images={include_images}") if document_id not in AVAILABLE_DOCS: return { "error": f"Document ID '{document_id}' not found. Use list_documents to see available documents.", "available_ids": list(AVAILABLE_DOCS.keys()) } try: doc_info = AVAILABLE_DOCS[document_id] s3_key = doc_info["s3_key"] # Download PDF from S3 logger.info(f"Downloading {s3_key} from S3") response = s3_client.get_object(Bucket=S3_BUCKET, Key=s3_key) pdf_content = response['Body'].read() # Parse PDF for text pdf_file = BytesIO(pdf_content) pdf_reader = pypdf.PdfReader(pdf_file) total_pages = len(pdf_reader.pages) # Validate and adjust page range page_start = max(1, page_start) if page_end is None: page_end = total_pages else: page_end = min(page_end, total_pages) if page_start > total_pages: return { "error": f"page_start ({page_start}) exceeds total pages ({total_pages})", "total_pages": total_pages } # Extract text from specified pages text_content = [] for page_num in range(page_start - 1, page_end): page = pdf_reader.pages[page_num] page_text = page.extract_text() text_content.append(f"--- Page {page_num + 1} ---\n{page_text}") full_text = "\n\n".join(text_content) # Extract page images if requested page_images = [] if include_images: logger.info(f"Converting pages {page_start}-{page_end} to images") # Convert PDF pages to images images = convert_from_bytes( pdf_content, first_page=page_start, last_page=page_end, dpi=150 # Balance between quality and size ) for idx, img in enumerate(images): # Convert to base64 buffered = BytesIO() img.save(buffered, format="PNG", optimize=True) img_base64 = base64.b64encode(buffered.getvalue()).decode('utf-8') page_images.append({ "page_number": page_start + idx, "image": img_base64, "format": "png" }) result = { "document_name": doc_info["name"], "document_id": document_id, "total_pages": total_pages, "pages_retrieved": f"{page_start}-{page_end}", "text": full_text, "images": page_images, "num_images": len(page_images), "size_bytes": len(pdf_content) } logger.info(f"Successfully retrieved {page_end - page_start + 1} pages ({len(page_images)} images) from {doc_info['name']}") return result except Exception as e: logger.error(f"Error retrieving document: {str(e)}") return { "error": f"Error retrieving document: {str(e)}", "document_id": document_id }
src/spec3_mcp_server/server.py:122-137 (schema)
Docstring defining the tool's input parameters, their types/defaults, and return format, serving as the schema for the tool.
""" Retrieve full text and visual content of Spec3 racing reference documents. Fetches complete PDF content from S3 including text and page images. Page images preserve diagrams, tables, and formatting that text extraction cannot capture. Args: document_id: Document ID from list_documents (e.g., "spec3_rules") page_start: Starting page number (default: 1) page_end: Ending page number (default: None for all remaining pages) include_images: Include page images for diagrams/tables (default: True) Returns: dict: Document text, page images (base64), metadata, and page range """
src/spec3_mcp_server/server.py:36-57 (helper)
AVAILABLE_DOCS mapping from document_id to S3 key and metadata, used to validate document_id and fetch the correct PDF.
AVAILABLE_DOCS = { "spec3_constructor_guide": { "name": "Spec3 E36 Race Car Constructor's Guide", "s3_key": "Spec3 E36 Race Car Contsructor's Guide.pdf", "description": "Comprehensive guide for building a Spec3 E36 race car" }, "bentley_manual_general": { "name": "Bentley General Manual", "s3_key": "bentley_general.pdf", "description": "Bentley BMW E36 Manual - GENERAL SECTION" }, "nasa_ccr": { "name": "2025 NASA Competition Comp Rules (CCR)", "s3_key": "2025.4_NASACCR.pdf", "description": "2025 NASA Club Championship Racing rules" }, "spec3_rules": { "name": "2025 Spec3 Rules", "s3_key": "2025_Spec3_Rules.pdf", "description": "2025 Spec3 racing class specific rules and regulations" } }
src/spec3_mcp_server/server.py:115-115 (registration)
@mcp.tool() decorator registers the get_document function as an MCP tool in the FastMCP server.
@mcp.tool()

Spec3 MCP Server

get_document

Instructions

Input Schema

Implementation Reference

Other Tools

Latest Blog Posts

MCP directory API