Skip to main content
Glama

validate_data

Check CSV file data integrity and format compliance to identify validation issues and warnings for data quality assurance.

Instructions

Validate CSV data integrity and format. Args: filename: Name of the CSV file Returns: Dictionary with validation results, issues, and warnings

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
filenameYes

Implementation Reference

  • MCP tool handler for 'validate_data'. Decorated with @mcp.tool() for registration and execution, delegates core logic to csv_manager.validate_data(filename). Includes error handling and documentation defining input/output schema.
    @mcp.tool() def validate_data(filename: str) -> Dict[str, Any]: """ Validate CSV data integrity and format. Args: filename: Name of the CSV file Returns: Dictionary with validation results, issues, and warnings """ try: return csv_manager.validate_data(filename) except Exception as e: return {"success": False, "error": str(e)}
  • Core helper function in CsvManager class implementing the data validation logic. Performs comprehensive checks for empty rows, duplicates, missing values, mixed types, and long text. Returns detailed validation results dictionary.
    def validate_data(self, filename: str) -> Dict[str, Any]: """Validate CSV data integrity and format.""" filepath = self._get_file_path(filename) if not filepath.exists(): raise FileNotFoundError(f"CSV file '{filename}' not found") try: df = pd.read_csv(filepath) validation_results = { "success": True, "filename": filename, "total_rows": len(df), "total_columns": len(df.columns), "issues": [], "warnings": [] } # Check for empty rows empty_rows = df.isnull().all(axis=1).sum() if empty_rows > 0: validation_results["issues"].append(f"Found {empty_rows} completely empty rows") # Check for duplicate rows duplicate_rows = df.duplicated().sum() if duplicate_rows > 0: validation_results["warnings"].append(f"Found {duplicate_rows} duplicate rows") # Check for missing values by column null_counts = df.isnull().sum() for col, null_count in null_counts.items(): if null_count > 0: percentage = (null_count / len(df)) * 100 validation_results["warnings"].append(f"Column '{col}' has {null_count} missing values ({percentage:.1f}%)") # Check for columns with mixed data types (if possible) for col in df.columns: if df[col].dtype == 'object': # Try to detect mixed numeric/text data numeric_count = pd.to_numeric(df[col], errors='coerce').notna().sum() if 0 < numeric_count < len(df): validation_results["warnings"].append(f"Column '{col}' appears to have mixed data types") # Check for unusually long text values for col in df.select_dtypes(include=['object']).columns: max_length = df[col].astype(str).str.len().max() if max_length > 1000: validation_results["warnings"].append(f"Column '{col}' has very long text values (max: {max_length} characters)") validation_results["is_valid"] = len(validation_results["issues"]) == 0 return validation_results except Exception as e: logger.error(f"Failed to validate data: {e}") raise

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/NovaAI-innovation/csv-mcp-server'

If you have feedback or need assistance with the MCP directory API, please join our Discord server