Skip to main content
Glama

parse_skills_column

Extract and encode individual skills from comma-separated columns in data files for analysis and visualization.

Instructions

Parse comma-separated skills into individual skills and create one-hot encoding.

Args: file_path: Path to the data file skills_column: Column name containing comma-separated skills output_path: Optional path to save the processed data

Returns: Information about the parsed skills data

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
file_pathYes
skills_columnYes
output_pathNo

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault
resultYes

Implementation Reference

  • The parse_skills_column tool is registered with the MCP server using the @mcp.tool() decorator
    @mcp.tool()
  • The parse_skills_column function implements the logic to parse comma-separated skills from a column and create one-hot encoding columns for each unique skill
    def parse_skills_column(file_path: str, skills_column: str, output_path: Optional[str] = None) -> str:
        """
        Parse comma-separated skills into individual skills and create one-hot encoding.
        
        Args:
            file_path: Path to the data file
            skills_column: Column name containing comma-separated skills
            output_path: Optional path to save the processed data
        
        Returns:
            Information about the parsed skills data
        """
        try:
            import pandas as pd
            from pathlib import Path
            
            # Load the data
            file_extension = Path(file_path).suffix.lower()
            if file_extension == '.csv':
                df = pd.read_csv(file_path)
            elif file_extension == '.json':
                df = pd.read_json(file_path)
            elif file_extension in ['.xlsx', '.xls']:
                df = pd.read_excel(file_path)
            elif file_extension == '.tsv':
                df = pd.read_csv(file_path, sep='\t')
            else:
                df = pd.read_csv(file_path)
            
            if skills_column not in df.columns:
                return f"Error: Column '{skills_column}' not found in data"
            
            # Parse skills and create one-hot encoding
            all_skills = set()
            
            # Extract all unique skills
            for skills_str in df[skills_column].dropna():
                if pd.isna(skills_str):
                    continue
                skills = [skill.strip() for skill in str(skills_str).split(',') if skill.strip()]
                all_skills.update(skills)
            
            all_skills = sorted(list(all_skills))
            
            # Create one-hot encoding for each skill
            skills_df = df.copy()
            for skill in all_skills:
                skills_df[f"skill_{skill.replace(' ', '_').replace('-', '_').lower()}"] = 0
            
            # Fill in the one-hot encoding
            for idx, skills_str in enumerate(df[skills_column]):
                if pd.isna(skills_str):
                    continue
                skills = [skill.strip() for skill in str(skills_str).split(',') if skill.strip()]
                for skill in skills:
                    col_name = f"skill_{skill.replace(' ', '_').replace('-', '_').lower()}"
                    if col_name in skills_df.columns:
                        skills_df.loc[idx, col_name] = 1
            
            # Save processed data if output path provided
            if output_path:
                if output_path.endswith('.csv'):
                    skills_df.to_csv(output_path, index=False)
                elif output_path.endswith('.json'):
                    skills_df.to_json(output_path, orient='records', indent=2)
                elif output_path.endswith(('.xlsx', '.xls')):
                    skills_df.to_excel(output_path, index=False)
                else:
                    skills_df.to_csv(output_path, index=False)
            
            result = {
                "skills_parsed": True,
                "original_column": skills_column,
                "unique_skills_count": len(all_skills),
                "unique_skills": all_skills[:20],  # First 20 skills for preview
                "rows_processed": len(df),
                "new_columns_added": len(all_skills),
                "output_file": output_path if output_path else None
            }
            
            return json.dumps(result, indent=2)
            
        except Exception as e:
            return f"Error parsing skills: {str(e)}\n{traceback.format_exc()}"
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. While it mentions parsing and one-hot encoding, it doesn't describe what happens to the original data (is it modified or copied?), file format requirements, error handling, or performance characteristics. The description is functional but lacks operational context needed for safe use.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured: a clear purpose statement followed by parameter explanations. Every sentence adds value, with no redundant information. The Args/Returns sections are appropriately formatted and contribute to understanding without verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (data parsing and transformation), no annotations, and an output schema that presumably documents return values, the description is minimally adequate. It covers the core operation and parameters but lacks important context about file formats, data validation, error conditions, and how the one-hot encoding is structured in output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must fully compensate. It provides clear semantic meaning for all three parameters: 'file_path' (path to data file), 'skills_column' (column containing skills), and 'output_path' (optional save location). This adds substantial value beyond the bare schema, though it doesn't specify file format expectations or column naming conventions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Parse comma-separated skills into individual skills and create one-hot encoding.' This specifies both the parsing action and the transformation (one-hot encoding). However, it doesn't explicitly differentiate from sibling tools like 'analyze_skills_by_location' or 'create_skills_location_heatmap' that might also process skills data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With siblings like 'analyze_skills_by_location', 'filter_data', and 'convert_data' that might handle similar data, there's no indication of when this specific parsing/encoding operation is appropriate versus other data manipulation tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/moeloubani/visidata-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server