read_hwp
Extracts text, tables in markdown format, and image listings from HWP and HWPX files. Provide the absolute file path to retrieve full document content.
Instructions
Read full HWP/HWPX document content as text + tables (markdown) + image listing. Args: file_path (absolute path).
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes |
Implementation Reference
- src/tools/read.ts:56-102 (handler)Main handler for the 'read_hwp' tool. Opens an HWP/HWPX document, extracts text (via walkText), tables (via walkTables), images listing (via walkImages), and returns a formatted Markdown string with document metadata.
export async function readHwp(args: ReadHwpArgs): Promise<string> { let doc; try { doc = await openDocument(args.file_path); } catch (e) { return (e as Error).message; } try { const text = walkText(doc); const tables = walkTables(doc); const images = walkImages(doc); const ext = extname(args.file_path).toUpperCase(); const paragraphCount = text.split("\n").length; const out: string[] = []; out.push(`# ${basename(args.file_path)}`); out.push( `형식: ${ext} | 문단: ${paragraphCount}개 | 표: ${tables.length}개 | 이미지: ${images.length}개` ); out.push(""); out.push(text); tables.forEach((t, i) => { out.push(""); out.push(`### 표 ${i + 1} (${t.rows}행 x ${t.cols}열)`); out.push(tableToMarkdown(t)); }); if (images.length > 0) { out.push(""); out.push("---"); out.push("## 포함된 이미지"); images.forEach((img, i) => { out.push( `${i + 1}. [section ${img.section}, para ${img.paragraph}, ctrl ${img.controlIdx}] ${img.mime} (${img.byteLength} bytes)` ); }); } return out.join("\n"); } catch (e) { return `파일 읽기 오류 (read error): ${(e as Error).message}`; } finally { closeDocument(doc); } } - src/tools/read.ts:11-13 (schema)Input schema / args interface for the read_hwp tool (and related read tools). Requires a single string property 'file_path'.
export interface ReadHwpArgs { file_path: string; } - src/server.ts:39-48 (registration)Tool registration for 'read_hwp' in the TOOLS array, defining name, description, and JSON Schema input.
{ name: "read_hwp", description: "Read full HWP/HWPX document content as text + tables (markdown) + image listing. Args: file_path (absolute path).", inputSchema: { type: "object", properties: { file_path: { type: "string" } }, required: ["file_path"], }, }, - src/server.ts:510-512 (registration)Registration of the readHwp handler function in the HANDLERS record, mapping tool name 'read_hwp' to the implementation.
read_hwp: readHwp, read_hwp_text: readHwpText, read_hwp_tables: readHwpTables, - src/core/document.ts:318-365 (helper)Helper used by readHwp to extract all tables from the document (called via walkTables). Returns TableData[] with row/col counts and cell text.
export function walkTables(doc: HwpDocument): TableData[] { const out: TableData[] = []; const sectionCount = doc.getSectionCount(); for (let s = 0; s < sectionCount; s++) { const paraCount = doc.getParagraphCount(s); for (let p = 0; p < paraCount; p++) { const n = controlCount(doc, s, p); for (let ci = 0; ci < n; ci++) { let dimsJson: string; try { dimsJson = doc.getTableDimensions(s, p, ci); } catch { continue; } if (!dimsJson || dimsJson === "null") continue; let dims: TableDims; try { dims = JSON.parse(dimsJson); } catch { continue; } const rows = Number(dims.rowCount ?? dims.rows ?? dims.row_count ?? 0); const cols = Number(dims.colCount ?? dims.cols ?? dims.col_count ?? 0); const cellCount = Number(dims.cellCount ?? dims.cell_count ?? rows * cols); if (rows === 0 || cols === 0) continue; // Tables with merged cells report cellCount < rows*cols. Walk by // cellCount instead of grid; place by getCellInfo (row, col, span). const cells: string[][] = Array.from({ length: rows }, () => Array(cols).fill("")); for (let cellIdx = 0; cellIdx < cellCount; cellIdx++) { let row = 0, col = 0; try { const info = JSON.parse(doc.getCellInfo(s, p, ci, cellIdx)); row = Number(info.row ?? info.r ?? 0); col = Number(info.col ?? info.c ?? 0); } catch { row = Math.floor(cellIdx / cols); col = cellIdx % cols; } if (row >= rows || col >= cols) continue; cells[row][col] = readCellText(doc, s, p, ci, cellIdx); } out.push({ rows, cols, cells }); } } } return out; }