get_document_text
Extract and retrieve all text content from a Microsoft Word document for processing, analysis, or integration with other workflows using the Office Word MCP Server.
Instructions
Extract all text from a Word document.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filename | Yes |
Implementation Reference
- word_document_server/main.py:537-540 (registration)MCP tool registration with @mcp.tool() decorator. This is the entrypoint for the 'get_document_text' tool in the FastMCP server, delegating to the underlying implementation.@mcp.tool() async def get_document_text(filename: str): """Extract all text from a Word document.""" return await document_tools.get_document_text(filename)
- Primary handler function implementing the tool logic: handles filename normalization and calls the text extraction utility.async def get_document_text(filename: str) -> str: """Extract all text from a Word document from local path or URL. Args: filename: Path or URL to the Word document """ # Only add .docx extension for local paths, not URLs if not is_url(filename): filename = ensure_docx_extension(filename) return extract_document_text(filename)
- Core helper function that loads the document (handling URLs and temp files) and extracts all text from paragraphs and tables.def extract_document_text(doc_path: str) -> str: """Extract all text from a Word document from local path or URL.""" doc, error, is_temp, temp_path = load_document_from_path_or_url(doc_path) if error: return error try: text = [] for paragraph in doc.paragraphs: text.append(paragraph.text) for table in doc.tables: for row in table.rows: for cell in row.cells: for paragraph in cell.paragraphs: text.append(paragraph.text) return "\n".join(text) except Exception as e: return f"Failed to extract text: {str(e)}" finally: # Clean up temp file if needed if is_temp and temp_path: cleanup_temp_file(temp_path)