Skip to main content
Glama

tag_csv

Classify each row in a CSV file by applying a predefined taxonomy of tags using parallel LLM inference from multiple providers, with optional reasoning and confidence scores.

Instructions

Tag all rows in a CSV file based on a provided taxonomy using parallel LLM inference.

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
csv_pathYesPath to the CSV file to tag
taxonomyYesList of possible tags/categories to assign (e.g., ["technology", "business", "science"])
text_columnNoName of the column containing text to analyze (default: "text")text
providerNoLLM provider - "claude", "openai", "gemini", "groq", or "bedrock" (default: "groq")groq
modelNoModel identifier (default: "llama-3.3-70b-versatile")llama-3.3-70b-versatile
api_keyNoAPI key for the provider (if not set via environment variable)
output_pathNoOptional path to save the tagged CSV (if not provided, returns preview)
include_reasoningNoWhether to include detailed reasoning and reflection in output (default: False)
field_nameNoName for the classification field (default: "category")category

Output Schema

TableJSON Schema
NameRequiredDescriptionDefault

No arguments

Implementation Reference

  • The main handler function for the 'tag_csv' tool. Registers via @mcp.tool() decorator. Reads a CSV, applies LLM-based tagging using a list of categories via parallel inference, and returns tagged results.
    @mcp.tool()
    def tag_csv(
        csv_path: str,
        taxonomy: List[str],
        text_column: str = "text",
        provider: str = "groq",
        model: str = "llama-3.3-70b-versatile",
        api_key: Optional[str] = None,
        output_path: Optional[str] = None,
        include_reasoning: bool = False,
        field_name: str = "category"
    ) -> dict:
        """
        Tag all rows in a CSV file based on a provided taxonomy using parallel LLM inference.
    
        Args:
            csv_path: Path to the CSV file to tag
            taxonomy: List of possible tags/categories to assign (e.g., ["technology", "business", "science"])
            text_column: Name of the column containing text to analyze (default: "text")
            provider: LLM provider - "claude", "openai", "gemini", "groq", or "bedrock" (default: "groq")
            model: Model identifier (default: "llama-3.3-70b-versatile")
            api_key: API key for the provider (if not set via environment variable)
            output_path: Optional path to save the tagged CSV (if not provided, returns preview)
            include_reasoning: Whether to include detailed reasoning and reflection in output (default: False)
            field_name: Name for the classification field (default: "category")
    
        Returns:
            Dictionary with status, preview of tagged data, and optionally the output path
        """
        try:
            # Read the CSV file
            df = pl.read_csv(csv_path)
    
            # Validate that the text column exists
            if text_column not in df.columns:
                return {
                    "status": "error",
                    "message": f"Column '{text_column}' not found in CSV. Available columns: {df.columns}"
                }
    
            # Set up API key if provided
            if api_key:
                if provider.lower() in ["claude", "anthropic"]:
                    os.environ["ANTHROPIC_API_KEY"] = api_key
                else:
                    os.environ[f"{provider.upper()}_API_KEY"] = api_key
    
            # Convert tag list to taxonomy format
            taxonomy_dict = _create_taxonomy_from_tags(taxonomy, field_name)
    
            # Get the provider enum
            provider_enum = PROVIDER_MAP.get(provider.lower())
            if not provider_enum:
                return {
                    "status": "error",
                    "message": f"Unsupported provider: {provider}. Use 'claude', 'openai', 'gemini', 'groq', or 'bedrock'"
                }
    
            # Apply taxonomy tagging using polar_llama
            df = df.with_columns(
                tags=tag_taxonomy(
                    pl.col(text_column),
                    taxonomy_dict,
                    provider=provider_enum,
                    model=model
                )
            )
    
            # Extract the selected tag value and confidence
            df = df.with_columns([
                pl.col("tags").struct.field(field_name).struct.field("value").alias(field_name),
                pl.col("tags").struct.field(field_name).struct.field("confidence").alias("confidence")
            ])
    
            # Optionally include reasoning and reflection
            if include_reasoning:
                df = df.with_columns([
                    pl.col("tags").struct.field(field_name).struct.field("thinking").alias("thinking"),
                    pl.col("tags").struct.field(field_name).struct.field("reflection").alias("reflection")
                ])
    
            # Check for errors
            error_rows = df.filter(
                pl.col("tags").struct.field("_error").is_not_null()
            )
    
            if len(error_rows) > 0:
                error_details = error_rows.select([
                    text_column,
                    pl.col("tags").struct.field("_error").alias("error"),
                    pl.col("tags").struct.field("_details").alias("error_details")
                ]).to_dicts()
            else:
                error_details = None
    
            # Drop the raw tags column for cleaner output
            df = df.drop("tags")
    
            # Save to file if output path is provided
            if output_path:
                df.write_csv(output_path)
                result = {
                    "status": "success",
                    "message": f"Successfully tagged {len(df)} rows",
                    "output_path": output_path,
                    "preview": df.head(5).to_dicts(),
                    "total_rows": len(df)
                }
                if error_details:
                    result["errors"] = error_details
            else:
                result = {
                    "status": "success",
                    "message": f"Successfully tagged {len(df)} rows",
                    "data": df.to_dicts(),
                    "total_rows": len(df)
                }
                if error_details:
                    result["errors"] = error_details
    
            return result
    
        except Exception as e:
            return {
                "status": "error",
                "message": str(e)
            }
  • Helper that converts the simple list of tags into the taxonomy dictionary format required by the polar_llama library's tag_taxonomy function.
    def _create_taxonomy_from_tags(tags: List[str], field_name: str = "category") -> Dict[str, Any]:
        """Convert a simple list of tags into a taxonomy structure."""
        return {
            field_name: {
                "description": f"The most appropriate {field_name} for this text",
                "values": {tag: f"Content that belongs in the '{tag}' {field_name}" for tag in tags}
            }
        }
  • tagging.py:33-34 (registration)
    The @mcp.tool() decorator registers this function as an MCP tool named 'tag_csv' on the FastMCP server instance.
    @mcp.tool()
    def tag_csv(
  • Mapping from provider string names to polar_llama Provider enum values, used by tag_csv to resolve the provider parameter.
    PROVIDER_MAP = {
        "claude": Provider.ANTHROPIC,
        "anthropic": Provider.ANTHROPIC,
        "openai": Provider.OPENAI,
        "gemini": Provider.GEMINI,
        "groq": Provider.GROQ,
        "bedrock": Provider.BEDROCK
    }
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description carries full burden. It mentions parallel inference and output options but omits details like error behavior, rate limits, or prerequisites. Reasonable but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with no filler. All words contribute to the purpose and method. Efficiently structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 9 parameters, no annotations, and an output schema not described, the description leaves gaps (e.g., return format, error scenarios). However, the presence of an output schema partially mitigates. It is minimally complete but not rich.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description does not add significant meaning beyond the schema; it rephrases but does not clarify usage context or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the action ('Tag'), the resource ('rows in a CSV file'), and the method ('using parallel LLM inference'). It distinguishes from siblings by specifying the operation on CSV data with taxonomy-based tagging.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus siblings like 'tag_csv_advanced' or 'preview_csv'. The description implies its use for tagging, but lacks when-not-to-use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/daviddrummond95/tagging_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server