VisiData MCP Server

by moeloubani

Overview Schema Related Servers Score Discussions

Python

Local

analyze_salary_by_location_and_skills

Analyze salary data by location and skills to identify compensation trends and inform career decisions.

Instructions

Analyze salary statistics by location and skills combination.

Args: file_path: Path to the data file salary_column: Column name containing salary information location_column: Column name containing location information skills_column: Column name containing comma-separated skills output_path: Optional path to save the analysis results

Returns: Salary analysis by location and skills

Input Schema

TableJSON Schema

Name	Required	Description	Default
`file_path`	Yes
`salary_column`	Yes
`location_column`	Yes
`skills_column`	Yes
`output_path`	No

Output Schema

TableJSON Schema

Name	Required	Description	Default
`result`	Yes

Implementation Reference

src/visidata_mcp/server.py:1318-1460 (handler)

Complete implementation of analyze_salary_by_location_and_skills tool. This function loads data from a file, validates salary/location/skills columns, cleans salary data by extracting numeric values using regex, analyzes salary statistics by location and skills (mean, median, min, max, std), and optionally saves results to an output file.

def analyze_salary_by_location_and_skills(file_path: str, salary_column: str, location_column: str, 
                                        skills_column: str, output_path: Optional[str] = None) -> str:
    """
    Analyze salary statistics by location and skills combination.
    
    Args:
        file_path: Path to the data file
        salary_column: Column name containing salary information
        location_column: Column name containing location information
        skills_column: Column name containing comma-separated skills
        output_path: Optional path to save the analysis results
    
    Returns:
        Salary analysis by location and skills
    """
    try:
        import pandas as pd
        from pathlib import Path
        from collections import defaultdict
        import re
        
        # Load the data
        file_extension = Path(file_path).suffix.lower()
        if file_extension == '.csv':
            df = pd.read_csv(file_path)
        elif file_extension == '.json':
            df = pd.read_json(file_path)
        elif file_extension in ['.xlsx', '.xls']:
            df = pd.read_excel(file_path)
        elif file_extension == '.tsv':
            df = pd.read_csv(file_path, sep='\t')
        else:
            df = pd.read_csv(file_path)
        
        # Validate columns exist
        for col in [salary_column, location_column, skills_column]:
            if col not in df.columns:
                return f"Error: Column '{col}' not found in data"
        
        # Clean and convert salary data
        def extract_salary(salary_str):
            if pd.isna(salary_str):
                return None
            # Extract numbers from salary string (handle ranges by taking average)
            numbers = re.findall(r'\d+(?:,\d{3})*(?:\.\d+)?', str(salary_str))
            if len(numbers) == 0:
                return None
            elif len(numbers) == 1:
                return float(numbers[0].replace(',', ''))
            else:
                # Take average of range
                nums = [float(n.replace(',', '')) for n in numbers]
                return sum(nums) / len(nums)
        
        df['salary_numeric'] = df[salary_column].apply(extract_salary)
        
        # Filter out rows with missing data
        df_clean = df.dropna(subset=['salary_numeric', location_column, skills_column])
        
        if len(df_clean) == 0:
            return "Error: No valid data rows found after cleaning"
        
        # Analyze by location
        location_analysis = []
        for location in df_clean[location_column].unique():
            location_data = df_clean[df_clean[location_column] == location]
            
            salaries = location_data['salary_numeric']
            location_stats = {
                "location": location,
                "job_count": len(location_data),
                "avg_salary": round(salaries.mean(), 2),
                "median_salary": round(salaries.median(), 2),
                "min_salary": round(salaries.min(), 2),
                "max_salary": round(salaries.max(), 2),
                "std_salary": round(salaries.std(), 2)
            }
            location_analysis.append(location_stats)
        
        # Sort by average salary
        location_analysis.sort(key=lambda x: x["avg_salary"], reverse=True)
        
        # Analyze by top skills
        skill_salary_data = defaultdict(list)
        
        for _, row in df_clean.iterrows():
            skills_str = row[skills_column]
            salary = row['salary_numeric']
            
            if pd.isna(skills_str):
                continue
                
            skills = [skill.strip() for skill in str(skills_str).split(',') if skill.strip()]
            for skill in skills:
                skill_salary_data[skill].append(salary)
        
        # Calculate skill statistics (only for skills with enough data points)
        skill_analysis = []
        for skill, salaries in skill_salary_data.items():
            if len(salaries) >= 5:  # At least 5 data points
                skill_stats = {
                    "skill": skill,
                    "job_count": len(salaries),
                    "avg_salary": round(sum(salaries) / len(salaries), 2),
                    "median_salary": round(sorted(salaries)[len(salaries)//2], 2),
                    "min_salary": round(min(salaries), 2),
                    "max_salary": round(max(salaries), 2)
                }
                skill_analysis.append(skill_stats)
        
        # Sort by average salary
        skill_analysis.sort(key=lambda x: x["avg_salary"], reverse=True)
        
        # Save analysis if output path provided
        if output_path:
            analysis_data = {
                "location_analysis": location_analysis,
                "skill_analysis": skill_analysis[:50]  # Top 50 skills
            }
            
            if output_path.endswith('.json'):
                with open(output_path, 'w') as f:
                    json.dump(analysis_data, f, indent=2)
            else:
                # Create separate sheets for locations and skills
                with pd.ExcelWriter(output_path, engine='openpyxl') as writer:
                    pd.DataFrame(location_analysis).to_excel(writer, sheet_name='Locations', index=False)
                    pd.DataFrame(skill_analysis).to_excel(writer, sheet_name='Skills', index=False)
        
        result = {
            "salary_analysis_completed": True,
            "locations_analyzed": len(location_analysis),
            "skills_analyzed": len(skill_analysis),
            "total_jobs_analyzed": len(df_clean),
            "top_paying_locations": location_analysis[:10],
            "top_paying_skills": skill_analysis[:15],
            "output_file": output_path if output_path else None
        }
        
        return json.dumps(result, indent=2)
        
    except Exception as e:
        return f"Error analyzing salary by location and skills: {str(e)}\n{traceback.format_exc()}"

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions 'analyze' and returns 'salary analysis,' implying a read-only operation, but doesn't disclose behavioral traits like whether it modifies files, requires specific data formats, has performance considerations, or what 'analysis' entails (e.g., statistical summaries, visualizations). The mention of 'output_path' suggests file creation, but this isn't explicitly stated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and well-structured: a clear purpose statement followed by 'Args' and 'Returns' sections. Every sentence earns its place, with no redundant information. It could be slightly more front-loaded by integrating parameter hints into the main description, but overall it's efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters with 0% schema coverage and no annotations, the description provides basic parameter semantics and mentions an output path, but lacks details on data processing behavior, error conditions, or analysis methodology. The presence of an output schema (implied by 'Has output schema: true') reduces the need to explain return values, but overall completeness is moderate for a data analysis tool with multiple siblings.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It lists all 5 parameters in the 'Args' section with brief explanations (e.g., 'Path to the data file'), adding basic semantics beyond schema titles. However, it doesn't provide details like expected file formats, column data types, or how 'comma-separated skills' are processed, leaving gaps in parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Analyze salary statistics by location and skills combination.' It specifies the verb 'analyze' and the resource 'salary statistics' with dimensions 'location and skills.' However, it doesn't explicitly differentiate from siblings like 'analyze_skills_by_location' or 'create_skills_location_heatmap,' which appear related.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With multiple sibling tools for analysis (e.g., 'analyze_data,' 'analyze_skills_by_location'), there's no indication of when this specific salary-focused analysis is preferred or what prerequisites exist (e.g., data format requirements).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/moeloubani/visidata-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server