Skip to main content
Glama

extract_form_fields

Extract form fields and their properties from PDF documents to access structured data for processing or analysis.

Instructions

Extract all form fields from a PDF

Args:
    pdf_path: Path to the PDF file

Returns:
    Dictionary of form field names and their properties

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
pdf_pathYes

Implementation Reference

  • The core handler implementation for the 'extract_form_fields' tool. Decorated with @mcp.tool() for automatic registration and schema generation from type hints and docstring. Extracts form fields from PDF using PyMuPDF, with special logic for radio button and choice field options.
    @mcp.tool()
    def extract_form_fields(pdf_path: str) -> Dict[str, Any]:
        """
        Extract all form fields from a PDF
    
        Args:
            pdf_path: Path to the PDF file
    
        Returns:
            Dictionary of form field names and their properties
        """
        try:
            doc = fitz.open(pdf_path)
            result = {}
            radio_button_options = {}  # To collect radio button states
    
            # First pass: collect all radio button options
            for page in doc:
                for widget in page.widgets():
                    field_name = widget.field_name
                    field_type = widget.field_type
                    
                    # Collect radio button options
                    if field_type == 5:  # RadioButton
                        if field_name not in radio_button_options:
                            radio_button_options[field_name] = set()
                        
                        try:
                            # Get button states from the widget
                            states = widget.button_states()
                            if states and 'normal' in states:
                                # Add all non-'Off' options to our set
                                for state in states['normal']:
                                    if state != 'Off':
                                        # Replace HTML entity codes with actual characters
                                        option = state.replace('#20', ' ')
                                        radio_button_options[field_name].add(option)
                        except Exception:
                            pass
    
            # Second pass: extract all form fields
            for page in doc:
                for widget in page.widgets():
                    field_name = widget.field_name
                    field_value = widget.field_value
                    field_type = widget.field_type
                    field_type_name = widget.field_type_string
    
                    field_info = {
                        "type": field_type_name.lower(),
                        "value": field_value,
                        "field_type_id": field_type,
                    }
                    
                    # Add radio button options
                    if field_type == 5 and field_name in radio_button_options:
                        options = list(radio_button_options[field_name])
                        if options:
                            field_info["options"] = options
                    
                    # Add choice field options (combobox, listbox)
                    elif field_type == 3:  # Choice field
                        try:
                            # Get the field options
                            field_options = widget.choice_values
                            if field_options:
                                field_info["options"] = field_options
                        except AttributeError:
                            pass
    
                    # Only add if not already in results
                    if field_name not in result:
                        result[field_name] = field_info
    
            doc.close()
            return result
        except Exception as e:
            return {"error": str(e)}

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Wildebeest/mcp_pdf_forms'

If you have feedback or need assistance with the MCP directory API, please join our Discord server