extract_form_fields
Extract form fields and their properties from PDF documents to access structured data for processing or analysis.
Instructions
Extract all form fields from a PDF
Args:
pdf_path: Path to the PDF file
Returns:
Dictionary of form field names and their properties
Input Schema
TableJSON Schema
| Name | Required | Description | Default |
|---|---|---|---|
| pdf_path | Yes |
Implementation Reference
- mcp_pdf_forms/server.py:53-130 (handler)The core handler implementation for the 'extract_form_fields' tool. Decorated with @mcp.tool() for automatic registration and schema generation from type hints and docstring. Extracts form fields from PDF using PyMuPDF, with special logic for radio button and choice field options.@mcp.tool() def extract_form_fields(pdf_path: str) -> Dict[str, Any]: """ Extract all form fields from a PDF Args: pdf_path: Path to the PDF file Returns: Dictionary of form field names and their properties """ try: doc = fitz.open(pdf_path) result = {} radio_button_options = {} # To collect radio button states # First pass: collect all radio button options for page in doc: for widget in page.widgets(): field_name = widget.field_name field_type = widget.field_type # Collect radio button options if field_type == 5: # RadioButton if field_name not in radio_button_options: radio_button_options[field_name] = set() try: # Get button states from the widget states = widget.button_states() if states and 'normal' in states: # Add all non-'Off' options to our set for state in states['normal']: if state != 'Off': # Replace HTML entity codes with actual characters option = state.replace('#20', ' ') radio_button_options[field_name].add(option) except Exception: pass # Second pass: extract all form fields for page in doc: for widget in page.widgets(): field_name = widget.field_name field_value = widget.field_value field_type = widget.field_type field_type_name = widget.field_type_string field_info = { "type": field_type_name.lower(), "value": field_value, "field_type_id": field_type, } # Add radio button options if field_type == 5 and field_name in radio_button_options: options = list(radio_button_options[field_name]) if options: field_info["options"] = options # Add choice field options (combobox, listbox) elif field_type == 3: # Choice field try: # Get the field options field_options = widget.choice_values if field_options: field_info["options"] = field_options except AttributeError: pass # Only add if not already in results if field_name not in result: result[field_name] = field_info doc.close() return result except Exception as e: return {"error": str(e)}