tag_csv

Classify each row in a CSV file by applying a predefined taxonomy of tags using parallel LLM inference from multiple providers, with optional reasoning and confidence scores.

Instructions

Tag all rows in a CSV file based on a provided taxonomy using parallel LLM inference.

Input Schema

TableJSON Schema

Name	Required	Description	Default
`csv_path`	Yes	Path to the CSV file to tag
`taxonomy`	Yes	List of possible tags/categories to assign (e.g., ["technology", "business", "science"])
`text_column`	No	Name of the column containing text to analyze (default: "text")	text
`provider`	No	LLM provider - "claude", "openai", "gemini", "groq", or "bedrock" (default: "groq")	groq
`model`	No	Model identifier (default: "llama-3.3-70b-versatile")	llama-3.3-70b-versatile
`api_key`	No	API key for the provider (if not set via environment variable)
`output_path`	No	Optional path to save the tagged CSV (if not provided, returns preview)
`include_reasoning`	No	Whether to include detailed reasoning and reflection in output (default: False)
`field_name`	No	Name for the classification field (default: "category")	category

Output Schema

TableJSON Schema

Name	Required	Description	Default
No arguments

Implementation Reference

tagging.py:33-159 (handler)

The main handler function for the 'tag_csv' tool. Registers via @mcp.tool() decorator. Reads a CSV, applies LLM-based tagging using a list of categories via parallel inference, and returns tagged results.

@mcp.tool()
def tag_csv(
    csv_path: str,
    taxonomy: List[str],
    text_column: str = "text",
    provider: str = "groq",
    model: str = "llama-3.3-70b-versatile",
    api_key: Optional[str] = None,
    output_path: Optional[str] = None,
    include_reasoning: bool = False,
    field_name: str = "category"
) -> dict:
    """
    Tag all rows in a CSV file based on a provided taxonomy using parallel LLM inference.

    Args:
        csv_path: Path to the CSV file to tag
        taxonomy: List of possible tags/categories to assign (e.g., ["technology", "business", "science"])
        text_column: Name of the column containing text to analyze (default: "text")
        provider: LLM provider - "claude", "openai", "gemini", "groq", or "bedrock" (default: "groq")
        model: Model identifier (default: "llama-3.3-70b-versatile")
        api_key: API key for the provider (if not set via environment variable)
        output_path: Optional path to save the tagged CSV (if not provided, returns preview)
        include_reasoning: Whether to include detailed reasoning and reflection in output (default: False)
        field_name: Name for the classification field (default: "category")

    Returns:
        Dictionary with status, preview of tagged data, and optionally the output path
    """
    try:
        # Read the CSV file
        df = pl.read_csv(csv_path)

        # Validate that the text column exists
        if text_column not in df.columns:
            return {
                "status": "error",
                "message": f"Column '{text_column}' not found in CSV. Available columns: {df.columns}"
            }

        # Set up API key if provided
        if api_key:
            if provider.lower() in ["claude", "anthropic"]:
                os.environ["ANTHROPIC_API_KEY"] = api_key
            else:
                os.environ[f"{provider.upper()}_API_KEY"] = api_key

        # Convert tag list to taxonomy format
        taxonomy_dict = _create_taxonomy_from_tags(taxonomy, field_name)

        # Get the provider enum
        provider_enum = PROVIDER_MAP.get(provider.lower())
        if not provider_enum:
            return {
                "status": "error",
                "message": f"Unsupported provider: {provider}. Use 'claude', 'openai', 'gemini', 'groq', or 'bedrock'"
            }

        # Apply taxonomy tagging using polar_llama
        df = df.with_columns(
            tags=tag_taxonomy(
                pl.col(text_column),
                taxonomy_dict,
                provider=provider_enum,
                model=model
            )
        )

        # Extract the selected tag value and confidence
        df = df.with_columns([
            pl.col("tags").struct.field(field_name).struct.field("value").alias(field_name),
            pl.col("tags").struct.field(field_name).struct.field("confidence").alias("confidence")
        ])

        # Optionally include reasoning and reflection
        if include_reasoning:
            df = df.with_columns([
                pl.col("tags").struct.field(field_name).struct.field("thinking").alias("thinking"),
                pl.col("tags").struct.field(field_name).struct.field("reflection").alias("reflection")
            ])

        # Check for errors
        error_rows = df.filter(
            pl.col("tags").struct.field("_error").is_not_null()
        )

        if len(error_rows) > 0:
            error_details = error_rows.select([
                text_column,
                pl.col("tags").struct.field("_error").alias("error"),
                pl.col("tags").struct.field("_details").alias("error_details")
            ]).to_dicts()
        else:
            error_details = None

        # Drop the raw tags column for cleaner output
        df = df.drop("tags")

        # Save to file if output path is provided
        if output_path:
            df.write_csv(output_path)
            result = {
                "status": "success",
                "message": f"Successfully tagged {len(df)} rows",
                "output_path": output_path,
                "preview": df.head(5).to_dicts(),
                "total_rows": len(df)
            }
            if error_details:
                result["errors"] = error_details
        else:
            result = {
                "status": "success",
                "message": f"Successfully tagged {len(df)} rows",
                "data": df.to_dicts(),
                "total_rows": len(df)
            }
            if error_details:
                result["errors"] = error_details

        return result

    except Exception as e:
        return {
            "status": "error",
            "message": str(e)
        }

tagging.py:23-30 (schema)

Helper that converts the simple list of tags into the taxonomy dictionary format required by the polar_llama library's tag_taxonomy function.

def _create_taxonomy_from_tags(tags: List[str], field_name: str = "category") -> Dict[str, Any]:
    """Convert a simple list of tags into a taxonomy structure."""
    return {
        field_name: {
            "description": f"The most appropriate {field_name} for this text",
            "values": {tag: f"Content that belongs in the '{tag}' {field_name}" for tag in tags}
        }
    }

tagging.py:33-34 (registration)
The @mcp.tool() decorator registers this function as an MCP tool named 'tag_csv' on the FastMCP server instance.
```
@mcp.tool()
def tag_csv(
```

tagging.py:13-20 (helper)

Mapping from provider string names to polar_llama Provider enum values, used by tag_csv to resolve the provider parameter.

PROVIDER_MAP = {
    "claude": Provider.ANTHROPIC,
    "anthropic": Provider.ANTHROPIC,
    "openai": Provider.OPENAI,
    "gemini": Provider.GEMINI,
    "groq": Provider.GROQ,
    "bedrock": Provider.BEDROCK
}

Tagging MCP

tag_csv

Instructions

Input Schema

Output Schema

Implementation Reference

Tool Definition Quality

Other Tools

Latest Blog Posts

MCP directory API