read_docx
Extract text, tables, and image placeholders from DOCX files to access document content programmatically. Use this tool to read Word documents with structured paragraph separation.
Instructions
Read complete contents of a docx file including tables and images.Use this tool when you want to read file endswith '.docx'.Paragraphs are separated with two line breaks.This tool convert images into placeholder [Image].'--- Paragraph [number] ---' is indicator of each paragraph.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| path | Yes | Absolute path to target file |
Implementation Reference
- mcp_server_office/office.py:78-127 (handler)The main handler function for the read_docx tool. It validates the path, loads the DOCX document, processes paragraphs (including images and track changes), extracts table text, and formats the output with paragraph separators.async def read_docx(path: str) -> str: """Read docx file as text including tables. Args: path: relative path to target docx file Returns: str: Text representation of the document including tables """ if not await validate_path(path): raise ValueError(f"Not a docx file: {path}") document = Document(path) content = [] paragraph_index = 0 table_index = 0 # 全要素を順番に処理 for element in document._body._body: # パラグラフの処理 if element.tag == W_P: paragraph = document.paragraphs[paragraph_index] paragraph_index += 1 # 画像のチェック if paragraph._element.findall(f'.//{W_DRAWING}', WORDML_NS): content.append("[Image]") # テキストのチェック else: text = process_track_changes(paragraph._element) if text.strip(): content.append(text) else: # 空行を抜くと編集時に困るので、空行でも追加 content.append("") # テーブルの処理 elif element.tag == W_TBL: table = document.tables[table_index] table_index += 1 table_text = extract_table_text(table) content.append(f"[Table]\n{table_text}") separator = [f"--- Paragraph {i} ---" for i in range(len(content))] result = [] for i, p in enumerate(content): result.append(separator[i]) result.append(p) return "\n".join(result)
- mcp_server_office/tools.py:3-22 (schema)The Tool object definition providing the schema, name, and description for the read_docx tool, including input schema validation for the 'path' parameter.READ_DOCX = types.Tool( name="read_docx", description=( "Read complete contents of a docx file including tables and images." "Use this tool when you want to read file endswith '.docx'." "Paragraphs are separated with two line breaks." "This tool convert images into placeholder [Image]." "'--- Paragraph [number] ---' is indicator of each paragraph." ), inputSchema={ "type": "object", "properties": { "path": { "type": "string", "description": "Absolute path to target file", } }, "required": ["path"] } )
- mcp_server_office/office.py:358-360 (registration)Registers the read_docx tool (as READ_DOCX) in the MCP server's list_tools handler.@server.list_tools() async def list_tools() -> list[types.Tool]: return [READ_DOCX, EDIT_DOCX_PARAGRAPH, WRITE_DOCX, EDIT_DOCX_INSERT]
- mcp_server_office/office.py:367-369 (registration)Dispatches calls to the read_docx tool handler within the server's call_tool function.if name == "read_docx": content = await read_docx(arguments["path"]) return [types.TextContent(type="text", text=content)]
- mcp_server_office/office.py:62-77 (helper)Helper function to extract text from paragraphs while handling track changes (insertions). Used in read_docx.def process_track_changes(element: OxmlElement) -> str: """Process track changes in a paragraph element.""" text = "" for child in element: if child.tag == W_R: # Normal run for run_child in child: if run_child.tag == W_T: text += run_child.text if run_child.text else "" elif child.tag.endswith('ins'): # Insertion inserted_text = "" for run in child.findall('.//w:t', WORDML_NS): inserted_text += run.text if run.text else "" if inserted_text: text += inserted_text return text