get_document_text
Extract text content from Microsoft Word documents to access and process document information programmatically.
Instructions
Extract all text from a Word document.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| filename | Yes |
Implementation Reference
- word_document_server/main.py:109-112 (registration)Registration of the get_document_text tool using FastMCP @mcp.tool() decorator, delegating to the implementation in document_tools.@mcp.tool() def get_document_text(filename: str): """Extract all text from a Word document.""" return document_tools.get_document_text(filename)
- Handler function for get_document_text tool, ensures filename has .docx extension and calls the core extraction utility.async def get_document_text(filename: str) -> str: """Extract all text from a Word document. Args: filename: Path to the Word document """ filename = ensure_docx_extension(filename) return extract_document_text(filename)
- Core helper function implementing the text extraction logic by iterating over all paragraphs and table cells in the document.def extract_document_text(doc_path: str) -> str: """Extract all text from a Word document.""" import os if not os.path.exists(doc_path): return f"Document {doc_path} does not exist" try: doc = Document(doc_path) text = [] for paragraph in doc.paragraphs: text.append(paragraph.text) for table in doc.tables: for row in table.rows: for cell in row.cells: for paragraph in cell.paragraphs: text.append(paragraph.text) return "\n".join(text) except Exception as e: return f"Failed to extract text: {str(e)}"