Skip to main content
Glama

extract_from_column

Extract specific patterns from data columns using regex capturing groups, enabling structured data parsing for email components, product codes, names, and date elements.

Instructions

Extract patterns from a column using regex with capturing groups.

Returns: ColumnOperationResult with extraction details

Examples: # Extract email parts extract_from_column(ctx, "email", r"(.+)@(.+)")

# Extract code components extract_from_column(ctx, "product_code", r"([A-Z]{2})-(\d+)") # Extract and expand into multiple columns extract_from_column(ctx, "full_name", r"(\w+)\s+(\w+)", expand=True) # Extract year from date string extract_from_column(ctx, "date", r"\d{4}")

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
columnYesColumn name to extract patterns from
patternYesRegex pattern with capturing groups to extract
expandYesWhether to expand multiple groups into separate columns

Implementation Reference

  • The main handler function that performs regex extraction from a specified column in the DataFrame, optionally expanding multiple capturing groups into separate new columns. Includes input validation, regex compilation check, extraction logic using pandas str.extract, and returns a ColumnOperationResult.
    async def extract_from_column( ctx: Annotated[Context, Field(description="FastMCP context for session access")], column: Annotated[str, Field(description="Column name to extract patterns from")], pattern: Annotated[str, Field(description="Regex pattern with capturing groups to extract")], *, expand: Annotated[ bool, Field(description="Whether to expand multiple groups into separate columns"), ] = False, ) -> ColumnOperationResult: r"""Extract patterns from a column using regex with capturing groups. Returns: ColumnOperationResult with extraction details Examples: # Extract email parts extract_from_column(ctx, "email", r"(.+)@(.+)") # Extract code components extract_from_column(ctx, "product_code", r"([A-Z]{2})-(\d+)") # Extract and expand into multiple columns extract_from_column(ctx, "full_name", r"(\w+)\s+(\w+)", expand=True) # Extract year from date string extract_from_column(ctx, "date", r"\d{4}") """ # Get session_id from FastMCP context session_id = ctx.session_id _session, df = get_session_data(session_id) _validate_column_exists(column, df) # Validate regex pattern try: re.compile(pattern) except re.error as e: msg = "pattern" raise InvalidParameterError(msg, pattern, f"Invalid regex pattern: {e}") from e # Apply extraction # pandas typing limitation: str.extract(expand=bool) overload not properly typed in pandas-stubs # See: https://github.com/pandas-dev/pandas-stubs/issues/43 extracted = df[column].astype(str).str.extract(pattern, expand=expand) # type: ignore[call-overload] if expand and isinstance(extracted, pd.DataFrame): # Multiple capturing groups - create new columns columns_created = [] for i in range(len(extracted.columns)): new_col_name = f"{column}_extracted_{i}" df[new_col_name] = extracted.iloc[:, i] columns_created.append(new_col_name) affected_columns = columns_created operation_desc = f"extract_expand_{len(columns_created)}_groups" else: # Single group or no expand - replace original column if isinstance(extracted, pd.DataFrame): # Multiple groups but not expanding - take first group df[column] = extracted.iloc[:, 0] else: # Single series result df[column] = extracted affected_columns = [column] operation_desc = "extract_pattern" # Count successful extractions (non-null results) if expand and isinstance(extracted, pd.DataFrame): successful_extractions = int((~extracted.isna()).any(axis=1).sum()) else: successful_extractions = ( int(extracted.notna().sum()) if hasattr(extracted, "notna") else len(extracted) ) return ColumnOperationResult( operation=operation_desc, rows_affected=successful_extractions, columns_affected=affected_columns, )
  • Registers the extract_from_column function as an MCP tool named 'extract_from_column' on the FastMCP server instance column_text_server.
    column_text_server.tool(name="extract_from_column")(extract_from_column)
  • Pydantic/FastMCP schema defined via Annotated types and Field descriptions for the tool parameters: ctx (Context), column (str), pattern (str), expand (bool=False).
    async def extract_from_column( ctx: Annotated[Context, Field(description="FastMCP context for session access")], column: Annotated[str, Field(description="Column name to extract patterns from")], pattern: Annotated[str, Field(description="Regex pattern with capturing groups to extract")], *, expand: Annotated[ bool, Field(description="Whether to expand multiple groups into separate columns"), ] = False, ) -> ColumnOperationResult:

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jonpspri/databeak'

If you have feedback or need assistance with the MCP directory API, please join our Discord server