tag_csv
Classify each row in a CSV file by applying a predefined taxonomy of tags using parallel LLM inference from multiple providers, with optional reasoning and confidence scores.
Instructions
Tag all rows in a CSV file based on a provided taxonomy using parallel LLM inference.
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| csv_path | Yes | Path to the CSV file to tag | |
| taxonomy | Yes | List of possible tags/categories to assign (e.g., ["technology", "business", "science"]) | |
| text_column | No | Name of the column containing text to analyze (default: "text") | text |
| provider | No | LLM provider - "claude", "openai", "gemini", "groq", or "bedrock" (default: "groq") | groq |
| model | No | Model identifier (default: "llama-3.3-70b-versatile") | llama-3.3-70b-versatile |
| api_key | No | API key for the provider (if not set via environment variable) | |
| output_path | No | Optional path to save the tagged CSV (if not provided, returns preview) | |
| include_reasoning | No | Whether to include detailed reasoning and reflection in output (default: False) | |
| field_name | No | Name for the classification field (default: "category") | category |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
No arguments | |||
Implementation Reference
- tagging.py:33-159 (handler)The main handler function for the 'tag_csv' tool. Registers via @mcp.tool() decorator. Reads a CSV, applies LLM-based tagging using a list of categories via parallel inference, and returns tagged results.
@mcp.tool() def tag_csv( csv_path: str, taxonomy: List[str], text_column: str = "text", provider: str = "groq", model: str = "llama-3.3-70b-versatile", api_key: Optional[str] = None, output_path: Optional[str] = None, include_reasoning: bool = False, field_name: str = "category" ) -> dict: """ Tag all rows in a CSV file based on a provided taxonomy using parallel LLM inference. Args: csv_path: Path to the CSV file to tag taxonomy: List of possible tags/categories to assign (e.g., ["technology", "business", "science"]) text_column: Name of the column containing text to analyze (default: "text") provider: LLM provider - "claude", "openai", "gemini", "groq", or "bedrock" (default: "groq") model: Model identifier (default: "llama-3.3-70b-versatile") api_key: API key for the provider (if not set via environment variable) output_path: Optional path to save the tagged CSV (if not provided, returns preview) include_reasoning: Whether to include detailed reasoning and reflection in output (default: False) field_name: Name for the classification field (default: "category") Returns: Dictionary with status, preview of tagged data, and optionally the output path """ try: # Read the CSV file df = pl.read_csv(csv_path) # Validate that the text column exists if text_column not in df.columns: return { "status": "error", "message": f"Column '{text_column}' not found in CSV. Available columns: {df.columns}" } # Set up API key if provided if api_key: if provider.lower() in ["claude", "anthropic"]: os.environ["ANTHROPIC_API_KEY"] = api_key else: os.environ[f"{provider.upper()}_API_KEY"] = api_key # Convert tag list to taxonomy format taxonomy_dict = _create_taxonomy_from_tags(taxonomy, field_name) # Get the provider enum provider_enum = PROVIDER_MAP.get(provider.lower()) if not provider_enum: return { "status": "error", "message": f"Unsupported provider: {provider}. Use 'claude', 'openai', 'gemini', 'groq', or 'bedrock'" } # Apply taxonomy tagging using polar_llama df = df.with_columns( tags=tag_taxonomy( pl.col(text_column), taxonomy_dict, provider=provider_enum, model=model ) ) # Extract the selected tag value and confidence df = df.with_columns([ pl.col("tags").struct.field(field_name).struct.field("value").alias(field_name), pl.col("tags").struct.field(field_name).struct.field("confidence").alias("confidence") ]) # Optionally include reasoning and reflection if include_reasoning: df = df.with_columns([ pl.col("tags").struct.field(field_name).struct.field("thinking").alias("thinking"), pl.col("tags").struct.field(field_name).struct.field("reflection").alias("reflection") ]) # Check for errors error_rows = df.filter( pl.col("tags").struct.field("_error").is_not_null() ) if len(error_rows) > 0: error_details = error_rows.select([ text_column, pl.col("tags").struct.field("_error").alias("error"), pl.col("tags").struct.field("_details").alias("error_details") ]).to_dicts() else: error_details = None # Drop the raw tags column for cleaner output df = df.drop("tags") # Save to file if output path is provided if output_path: df.write_csv(output_path) result = { "status": "success", "message": f"Successfully tagged {len(df)} rows", "output_path": output_path, "preview": df.head(5).to_dicts(), "total_rows": len(df) } if error_details: result["errors"] = error_details else: result = { "status": "success", "message": f"Successfully tagged {len(df)} rows", "data": df.to_dicts(), "total_rows": len(df) } if error_details: result["errors"] = error_details return result except Exception as e: return { "status": "error", "message": str(e) } - tagging.py:23-30 (schema)Helper that converts the simple list of tags into the taxonomy dictionary format required by the polar_llama library's tag_taxonomy function.
def _create_taxonomy_from_tags(tags: List[str], field_name: str = "category") -> Dict[str, Any]: """Convert a simple list of tags into a taxonomy structure.""" return { field_name: { "description": f"The most appropriate {field_name} for this text", "values": {tag: f"Content that belongs in the '{tag}' {field_name}" for tag in tags} } } - tagging.py:33-34 (registration)The @mcp.tool() decorator registers this function as an MCP tool named 'tag_csv' on the FastMCP server instance.
@mcp.tool() def tag_csv( - tagging.py:13-20 (helper)Mapping from provider string names to polar_llama Provider enum values, used by tag_csv to resolve the provider parameter.
PROVIDER_MAP = { "claude": Provider.ANTHROPIC, "anthropic": Provider.ANTHROPIC, "openai": Provider.OPENAI, "gemini": Provider.GEMINI, "groq": Provider.GROQ, "bedrock": Provider.BEDROCK }