parse_document
Extract plain text from PDF or DOCX files to enable automated test scenario generation from user stories in development workflows.
Instructions
Read a PDF or DOCX and return plain text.
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| file_path | Yes |
Implementation Reference
- server.py:18-31 (handler)The main handler function for the 'parse_document' tool, registered via @mcp.tool() decorator. It extracts plain text from PDF or DOCX files using pypdf or docx libraries and stores it in an in-memory dictionary.@mcp.tool() def parse_document(file_path: str) -> str: """Read a PDF or DOCX and return plain text.""" if file_path.endswith(".pdf"): reader = pypdf.PdfReader(file_path) text = " ".join([page.extract_text() or "" for page in reader.pages]) elif file_path.endswith(".docx"): doc = docx.Document(file_path) text = " ".join([p.text for p in doc.paragraphs]) else: raise ValueError("Only PDF and DOCX supported") parsed_docs[file_path] = text return text