Enables parallel tagging and classification of CSV data using OpenAI models (GPT-4, GPT-4-turbo, GPT-3.5-turbo) with structured output and confidence scoring.
Tagging MCP
MCP server for tagging CSV rows using polar_llama with parallel LLM inference.
Overview
This MCP server enables fast, parallel tagging of CSV data using multiple LLM providers. It leverages polar_llama to process rows concurrently, making it ideal for batch classification and tagging tasks.
Features
Parallel Processing: Tag hundreds or thousands of CSV rows concurrently
Multiple LLM Providers: Support for Claude (Anthropic), OpenAI, Gemini, and Groq
Structured Output: Uses Pydantic models for consistent, type-safe results
Flexible Taxonomy: Define custom tag lists for your use case
Optional Reasoning: Include confidence levels and explanations for tags
Installation
Prerequisites
Python 3.12+
UV package manager
API key for at least one LLM provider
Environment Setup
Clone this repository
Create a
.envfile with your API keys:ANTHROPIC_API_KEY=your_key_here OPENAI_API_KEY=your_key_here GEMINI_API_KEY=your_key_here GROQ_API_KEY=your_key_here
Claude Desktop Configuration
Option 1: Local Development (Recommended)
Run directly without containers:
Option 2: Container Deployment
Build the container:
container build -t tagging_mcp .Configure Claude Desktop:
{ "mcpServers": { "tagging-mcp": { "command": "container", "args": ["run", "--interactive", "tagging_mcp"] } } }
Available Tools
tag_csv
Simple tagging with a list of categories. Perfect for basic classification tasks.
Parameters:
csv_path(str): Path to the CSV file to tagtaxonomy(List[str]): List of possible tags/categories (e.g., ["technology", "business", "science"])text_column(str, optional): Column containing text to analyze (default: "text")provider(str, optional): LLM provider - "claude", "openai", "gemini", "groq", or "bedrock" (default: "groq")model(str, optional): Model identifier (default: "llama-3.3-70b-versatile")api_key(str, optional): API key if not set via environment variableoutput_path(str, optional): Path to save tagged CSVinclude_reasoning(bool, optional): Include detailed reasoning and reflection (default: false)field_name(str, optional): Name for the classification field (default: "category")
Returns: Dictionary with status, tagged data preview, confidence scores, and optional errors
tag_csv_advanced
Advanced multi-dimensional classification with custom taxonomy definitions. Use this for complex tagging with multiple fields.
Parameters:
csv_path(str): Path to the CSV file to tagtaxonomy(Dict): Full taxonomy dictionary with field definitions and value descriptionstext_column(str, optional): Column containing text to analyze (default: "text")provider(str, optional): LLM provider (default: "groq")model(str, optional): Model identifier (default: "llama-3.3-70b-versatile")api_key(str, optional): API key if not set via environment variableoutput_path(str, optional): Path to save tagged CSVinclude_reasoning(bool, optional): Include detailed reasoning (default: false)
Example Taxonomy:
Returns: Dictionary with status, all field values, confidence scores per field, and optional reasoning
preview_csv
Preview the first few rows of a CSV file to understand its structure.
Parameters:
csv_path(str): Path to the CSV filerows(int, optional): Number of rows to preview (default: 5)
Returns: Dictionary with columns, row count, and preview data
get_tagging_info
Get information about the tagging MCP server and supported providers.
Returns: Server metadata, supported providers, features, and available tools
Example Usage
Basic Tagging
Preview your CSV:
Use preview_csv with csv_path="/path/to/data.csv"Simple category tagging:
Use tag_csv with: - csv_path="/path/to/data.csv" - taxonomy=["technology", "business", "science", "politics"] - text_column="description" - output_path="/path/to/tagged_output.csv"Include reasoning for transparency:
Use tag_csv with: - csv_path="/path/to/data.csv" - taxonomy=["urgent", "normal", "low_priority"] - field_name="priority" - include_reasoning=true
Advanced Multi-Field Tagging
For complex classification with multiple dimensions:
Output Structure
Basic Tagging Output
Original CSV columns
{field_name}: The selected tagconfidence: Confidence score (0.0 to 1.0)thinking: Reasoning for each possible value (ifinclude_reasoning=true)reflection: Overall analysis (ifinclude_reasoning=true)
Advanced Tagging Output
Original CSV columns
For each taxonomy field:
{field_name}: Selected value{field_name}_confidence: Confidence score{field_name}_thinking: Reasoning dict (if enabled){field_name}_reflection: Analysis (if enabled)
Supported LLM Providers
Groq (Recommended): llama-3.3-70b-versatile, llama-3.1-70b-versatile, mixtral-8x7b-32768
Claude (Anthropic): claude-3-5-sonnet-20241022, claude-3-opus-20240229
OpenAI: gpt-4, gpt-4-turbo, gpt-3.5-turbo
Gemini: gemini-1.5-pro, gemini-1.5-flash
AWS Bedrock: anthropic.claude-3-sonnet, anthropic.claude-3-haiku
Key Features
⨠Detailed Reasoning: For each tag, see why the model chose it š Reflection: Model reflects on its analysis š Confidence Scores: Know how confident each classification is (0.0-1.0) ā” Parallel Processing: All rows processed concurrently šÆ Error Detection: Automatic error tracking and reporting š§ Flexible: Simple list or complex multi-field taxonomies
License
MIT
This server cannot be installed
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Enables parallel tagging and classification of CSV data using multiple LLM providers with structured output, confidence scores, and optional reasoning for batch classification tasks.