Skip to main content
Glama
ZatesloFL

Google Workspace MCP Server

by ZatesloFL

get_drive_file_content

Extract readable text content from Google Drive files by ID, including native Google Docs, Office files, and other formats. Supports shared drives and handles file decoding or binary detection.

Instructions

Retrieves the content of a specific Google Drive file by ID, supporting files in shared drives.

• Native Google Docs, Sheets, Slides → exported as text / CSV. • Office files (.docx, .xlsx, .pptx) → unzipped & parsed with std-lib to extract readable text. • Any other file → downloaded; tries UTF-8 decode, else notes binary.

Args: user_google_email: The user’s Google email address. file_id: Drive file ID.

Returns: str: The file content as plain text with metadata header.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_idYes
user_google_emailYes

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The main asynchronous handler function that downloads and extracts text content from a Google Drive file by ID. Supports Google native formats via export, Office XML parsing, and UTF-8 decoding with binary fallback. Includes file metadata header in response.
    async def get_drive_file_content(
        service,
        user_google_email: str,
        file_id: str,
    ) -> str:
        """
        Retrieves the content of a specific Google Drive file by ID, supporting files in shared drives.
    
        • Native Google Docs, Sheets, Slides → exported as text / CSV.
        • Office files (.docx, .xlsx, .pptx) → unzipped & parsed with std-lib to
          extract readable text.
        • Any other file → downloaded; tries UTF-8 decode, else notes binary.
    
        Args:
            user_google_email: The user’s Google email address.
            file_id: Drive file ID.
    
        Returns:
            str: The file content as plain text with metadata header.
        """
        logger.info(f"[get_drive_file_content] Invoked. File ID: '{file_id}'")
    
        file_metadata = await asyncio.to_thread(
            service.files().get(
                fileId=file_id, fields="id, name, mimeType, webViewLink", supportsAllDrives=True
            ).execute
        )
        mime_type = file_metadata.get("mimeType", "")
        file_name = file_metadata.get("name", "Unknown File")
        export_mime_type = {
            "application/vnd.google-apps.document": "text/plain",
            "application/vnd.google-apps.spreadsheet": "text/csv",
            "application/vnd.google-apps.presentation": "text/plain",
        }.get(mime_type)
    
        request_obj = (
            service.files().export_media(fileId=file_id, mimeType=export_mime_type)
            if export_mime_type
            else service.files().get_media(fileId=file_id)
        )
        fh = io.BytesIO()
        downloader = MediaIoBaseDownload(fh, request_obj)
        loop = asyncio.get_event_loop()
        done = False
        while not done:
            status, done = await loop.run_in_executor(None, downloader.next_chunk)
    
        file_content_bytes = fh.getvalue()
    
        # Attempt Office XML extraction only for actual Office XML files
        office_mime_types = {
            "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
            "application/vnd.openxmlformats-officedocument.presentationml.presentation",
            "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
        }
    
        if mime_type in office_mime_types:
            office_text = extract_office_xml_text(file_content_bytes, mime_type)
            if office_text:
                body_text = office_text
            else:
                # Fallback: try UTF-8; otherwise flag binary
                try:
                    body_text = file_content_bytes.decode("utf-8")
                except UnicodeDecodeError:
                    body_text = (
                        f"[Binary or unsupported text encoding for mimeType '{mime_type}' - "
                        f"{len(file_content_bytes)} bytes]"
                    )
        else:
            # For non-Office files (including Google native files), try UTF-8 decode directly
            try:
                body_text = file_content_bytes.decode("utf-8")
            except UnicodeDecodeError:
                body_text = (
                    f"[Binary or unsupported text encoding for mimeType '{mime_type}' - "
                    f"{len(file_content_bytes)} bytes]"
                )
    
        # Assemble response
        header = (
            f'File: "{file_name}" (ID: {file_id}, Type: {mime_type})\n'
            f'Link: {file_metadata.get("webViewLink", "#")}\n\n--- CONTENT ---\n'
        )
        return header + body_text
  • Registers the tool with the MCP server using @server.tool(), applies HTTP error handling decorator with tool name, and requires Google Drive read authentication.
    @server.tool()
    @handle_http_errors("get_drive_file_content", is_read_only=True, service_type="drive")
    @require_google_service("drive", "drive_read")
  • Function signature with type annotations and comprehensive docstring defining input parameters (user_google_email: str, file_id: str) and return type (str), describing tool behavior and supported file types.
    async def get_drive_file_content(
        service,
        user_google_email: str,
        file_id: str,
    ) -> str:
        """
        Retrieves the content of a specific Google Drive file by ID, supporting files in shared drives.
    
        • Native Google Docs, Sheets, Slides → exported as text / CSV.
        • Office files (.docx, .xlsx, .pptx) → unzipped & parsed with std-lib to
          extract readable text.
        • Any other file → downloaded; tries UTF-8 decode, else notes binary.
    
        Args:
            user_google_email: The user’s Google email address.
            file_id: Drive file ID.
    
        Returns:
            str: The file content as plain text with metadata header.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does well by detailing behavioral traits: it specifies how different file types are processed (exported, parsed, or downloaded), notes encoding attempts, and mentions a metadata header in returns. It doesn't cover aspects like rate limits or auth needs beyond the user email parameter, but provides substantial operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: the first sentence states the core purpose, followed by bullet points for file type handling and a clear 'Args'/'Returns' section. Every sentence adds value without redundancy, making it efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (handling multiple file types) and no annotations, the description is complete: it explains the tool's purpose, behavioral details, parameters, and return value. With an output schema present, it doesn't need to elaborate on return structure, and it adequately covers the context needed for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds meaning by explaining that 'user_google_email' is for the user's Google email and 'file_id' is the Drive file ID, clarifying their roles beyond the schema's basic titles. However, it doesn't specify format details or constraints for these parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('retrieves') and resource ('content of a specific Google Drive file by ID'), specifying it works with files in shared drives. It distinguishes itself from siblings like 'get_doc_content' by focusing on generic Drive file content retrieval across formats, not just Docs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by detailing supported file types and processing methods, suggesting it's for extracting text from various Drive files. However, it doesn't explicitly state when to use this tool versus alternatives like 'get_doc_content' or 'list_drive_items', nor does it mention prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ZatesloFL/google_workspace_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server