Skip to main content
Glama

Get_Document_Content

Extract content from specific documents in SharePoint by specifying folder and file names, enabling direct interaction with stored data.

Instructions

Get content of a document in SharePoint

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_nameYes
folder_nameYes

Implementation Reference

  • Core implementation of get_document_content: downloads file from SharePoint, extracts text for supported formats (PDF, Excel, Word, text files), falls back to base64 for binary.
    def get_document_content(folder_name: str, file_name: str) -> dict: """Retrieve document content; supports PDF text extraction""" file_path = _get_sp_path(f"{folder_name}/{file_name}") file = sp_context.web.get_file_by_server_relative_url(file_path) sp_context.load(file, ["Exists", "Length", "Name"]) sp_context.execute_query() logger.info(f"File exists: {file.exists}, size: {file.length}") content = io.BytesIO() file.download(content) sp_context.execute_query() content_bytes = content.getvalue() # Determine file type and process accordingly lower_name = file_name.lower() file_type = next((t for t, exts in FILE_TYPES.items() if any(lower_name.endswith(ext) for ext in exts)), 'binary') if file_type == 'pdf': try: text, pages = extract_text_from_pdf(content_bytes) return {"name": file_name, "content_type": "text", "content": text, "original_type": "pdf", "page_count": pages, "size": len(content_bytes)} except Exception as e: logger.warning(f"PDF processing failed: {e}") return {"name": file_name, "content_type": "binary", "content_base64": base64.b64encode(content_bytes).decode(), "original_type": "pdf", "size": len(content_bytes)} if file_type == 'excel': try: text, sheets = extract_text_from_excel(content_bytes) return {"name": file_name, "content_type": "text", "content": text, "original_type": "excel", "sheet_count": sheets, "size": len(content_bytes)} except Exception as e: logger.warning(f"Excel processing failed: {e}") return {"name": file_name, "content_type": "binary", "content_base64": base64.b64encode(content_bytes).decode(), "original_type": "excel", "size": len(content_bytes)} if file_type == 'word': try: text, paragraphs = extract_text_from_word(content_bytes) return {"name": file_name, "content_type": "text", "content": text, "original_type": "word", "paragraph_count": paragraphs, "size": len(content_bytes)} except Exception as e: logger.warning(f"Word processing failed: {e}") return {"name": file_name, "content_type": "binary", "content_base64": base64.b64encode(content_bytes).decode(), "original_type": "word", "size": len(content_bytes)} if file_type == 'text': try: return {"name": file_name, "content_type": "text", "content": content_bytes.decode('utf-8'), "size": len(content_bytes)} except UnicodeDecodeError: pass return {"name": file_name, "content_type": "binary", "content_base64": base64.b64encode(content_bytes).decode(), "size": len(content_bytes)}
  • MCP tool registration using @mcp.tool decorator. The handler function delegates to resources.get_document_content.
    @mcp.tool(name="Get_Document_Content", description="Get content of a document in SharePoint") async def get_document_content_tool(folder_name: str, file_name: str): """Get content of a document in SharePoint""" return get_document_content(folder_name, file_name)

Other Tools

Related Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Sofias-ai/mcp-sharepoint'

If you have feedback or need assistance with the MCP directory API, please join our Discord server