Enables parallel tagging and classification of CSV data using OpenAI models (GPT-4, GPT-4-turbo, GPT-3.5-turbo) with structured output and confidence scoring.
Tagging MCP
MCP server for tagging CSV rows using polar_llama with parallel LLM inference.
Overview
This MCP server enables fast, parallel tagging of CSV data using multiple LLM providers. It leverages polar_llama to process rows concurrently, making it ideal for batch classification and tagging tasks.
Features
Parallel Processing: Tag hundreds or thousands of CSV rows concurrently
Multiple LLM Providers: Support for Claude (Anthropic), OpenAI, Gemini, and Groq
Structured Output: Uses Pydantic models for consistent, type-safe results
Flexible Taxonomy: Define custom tag lists for your use case
Optional Reasoning: Include confidence levels and explanations for tags
Installation
Prerequisites
Python 3.12+
UV package manager
API key for at least one LLM provider
Environment Setup
Clone this repository
Create a
.envfile with your API keys:ANTHROPIC_API_KEY=your_key_here OPENAI_API_KEY=your_key_here GEMINI_API_KEY=your_key_here GROQ_API_KEY=your_key_here
Claude Desktop Configuration
Option 1: Local Development (Recommended)
Run directly without containers:
Option 2: Container Deployment
Build the container:
container build -t tagging_mcp .Configure Claude Desktop:
{ "mcpServers": { "tagging-mcp": { "command": "container", "args": ["run", "--interactive", "tagging_mcp"] } } }
Available Tools
tag_csv
Simple tagging with a list of categories. Perfect for basic classification tasks.
Parameters:
csv_path(str): Path to the CSV file to tagtaxonomy(List[str]): List of possible tags/categories (e.g., ["technology", "business", "science"])text_column(str, optional): Column containing text to analyze (default: "text")provider(str, optional): LLM provider - "claude", "openai", "gemini", "groq", or "bedrock" (default: "groq")model(str, optional): Model identifier (default: "llama-3.3-70b-versatile")api_key(str, optional): API key if not set via environment variableoutput_path(str, optional): Path to save tagged CSVinclude_reasoning(bool, optional): Include detailed reasoning and reflection (default: false)field_name(str, optional): Name for the classification field (default: "category")
Returns: Dictionary with status, tagged data preview, confidence scores, and optional errors
tag_csv_advanced
Advanced multi-dimensional classification with custom taxonomy definitions. Use this for complex tagging with multiple fields.
Parameters:
csv_path(str): Path to the CSV file to tagtaxonomy(Dict): Full taxonomy dictionary with field definitions and value descriptionstext_column(str, optional): Column containing text to analyze (default: "text")provider(str, optional): LLM provider (default: "groq")model(str, optional): Model identifier (default: "llama-3.3-70b-versatile")api_key(str, optional): API key if not set via environment variableoutput_path(str, optional): Path to save tagged CSVinclude_reasoning(bool, optional): Include detailed reasoning (default: false)
Example Taxonomy:
Returns: Dictionary with status, all field values, confidence scores per field, and optional reasoning
preview_csv
Preview the first few rows of a CSV file to understand its structure.
Parameters:
csv_path(str): Path to the CSV filerows(int, optional): Number of rows to preview (default: 5)
Returns: Dictionary with columns, row count, and preview data
get_tagging_info
Get information about the tagging MCP server and supported providers.
Returns: Server metadata, supported providers, features, and available tools
Example Usage
Basic Tagging
Preview your CSV:
Use preview_csv with csv_path="/path/to/data.csv"Simple category tagging:
Use tag_csv with: - csv_path="/path/to/data.csv" - taxonomy=["technology", "business", "science", "politics"] - text_column="description" - output_path="/path/to/tagged_output.csv"Include reasoning for transparency:
Use tag_csv with: - csv_path="/path/to/data.csv" - taxonomy=["urgent", "normal", "low_priority"] - field_name="priority" - include_reasoning=true
Advanced Multi-Field Tagging
For complex classification with multiple dimensions:
Output Structure
Basic Tagging Output
Original CSV columns
{field_name}: The selected tagconfidence: Confidence score (0.0 to 1.0)thinking: Reasoning for each possible value (ifinclude_reasoning=true)reflection: Overall analysis (ifinclude_reasoning=true)
Advanced Tagging Output
Original CSV columns
For each taxonomy field:
{field_name}: Selected value{field_name}_confidence: Confidence score{field_name}_thinking: Reasoning dict (if enabled){field_name}_reflection: Analysis (if enabled)
Supported LLM Providers
Groq (Recommended): llama-3.3-70b-versatile, llama-3.1-70b-versatile, mixtral-8x7b-32768
Claude (Anthropic): claude-3-5-sonnet-20241022, claude-3-opus-20240229
OpenAI: gpt-4, gpt-4-turbo, gpt-3.5-turbo
Gemini: gemini-1.5-pro, gemini-1.5-flash
AWS Bedrock: anthropic.claude-3-sonnet, anthropic.claude-3-haiku
Key Features
⨠Detailed Reasoning: For each tag, see why the model chose it š Reflection: Model reflects on its analysis š Confidence Scores: Know how confident each classification is (0.0-1.0) ā” Parallel Processing: All rows processed concurrently šÆ Error Detection: Automatic error tracking and reporting š§ Flexible: Simple list or complex multi-field taxonomies
License
MIT