Get_Document_Content
Extract content from specific documents in SharePoint by specifying folder and file names, enabling direct interaction with stored data.
Instructions
Get content of a document in SharePoint
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_name | Yes | ||
| folder_name | Yes |
Implementation Reference
- src/mcp_sharepoint/resources.py:204-252 (handler)Core implementation of get_document_content: downloads file from SharePoint, extracts text for supported formats (PDF, Excel, Word, text files), falls back to base64 for binary.def get_document_content(folder_name: str, file_name: str) -> dict: """Retrieve document content; supports PDF text extraction""" file_path = _get_sp_path(f"{folder_name}/{file_name}") file = sp_context.web.get_file_by_server_relative_url(file_path) sp_context.load(file, ["Exists", "Length", "Name"]) sp_context.execute_query() logger.info(f"File exists: {file.exists}, size: {file.length}") content = io.BytesIO() file.download(content) sp_context.execute_query() content_bytes = content.getvalue() # Determine file type and process accordingly lower_name = file_name.lower() file_type = next((t for t, exts in FILE_TYPES.items() if any(lower_name.endswith(ext) for ext in exts)), 'binary') if file_type == 'pdf': try: text, pages = extract_text_from_pdf(content_bytes) return {"name": file_name, "content_type": "text", "content": text, "original_type": "pdf", "page_count": pages, "size": len(content_bytes)} except Exception as e: logger.warning(f"PDF processing failed: {e}") return {"name": file_name, "content_type": "binary", "content_base64": base64.b64encode(content_bytes).decode(), "original_type": "pdf", "size": len(content_bytes)} if file_type == 'excel': try: text, sheets = extract_text_from_excel(content_bytes) return {"name": file_name, "content_type": "text", "content": text, "original_type": "excel", "sheet_count": sheets, "size": len(content_bytes)} except Exception as e: logger.warning(f"Excel processing failed: {e}") return {"name": file_name, "content_type": "binary", "content_base64": base64.b64encode(content_bytes).decode(), "original_type": "excel", "size": len(content_bytes)} if file_type == 'word': try: text, paragraphs = extract_text_from_word(content_bytes) return {"name": file_name, "content_type": "text", "content": text, "original_type": "word", "paragraph_count": paragraphs, "size": len(content_bytes)} except Exception as e: logger.warning(f"Word processing failed: {e}") return {"name": file_name, "content_type": "binary", "content_base64": base64.b64encode(content_bytes).decode(), "original_type": "word", "size": len(content_bytes)} if file_type == 'text': try: return {"name": file_name, "content_type": "text", "content": content_bytes.decode('utf-8'), "size": len(content_bytes)} except UnicodeDecodeError: pass return {"name": file_name, "content_type": "binary", "content_base64": base64.b64encode(content_bytes).decode(), "size": len(content_bytes)}
- src/mcp_sharepoint/tools.py:48-51 (registration)MCP tool registration using @mcp.tool decorator. The handler function delegates to resources.get_document_content.@mcp.tool(name="Get_Document_Content", description="Get content of a document in SharePoint") async def get_document_content_tool(folder_name: str, file_name: str): """Get content of a document in SharePoint""" return get_document_content(folder_name, file_name)