Tagging MCP

README.md•7.11 KiB

# Tagging MCP MCP server for tagging CSV rows using polar_llama with parallel LLM inference. ## Overview This MCP server enables fast, parallel tagging of CSV data using multiple LLM providers. It leverages [polar_llama](https://github.com/daviddrummond95/polar_llama) to process rows concurrently, making it ideal for batch classification and tagging tasks. ## Features - **Parallel Processing**: Tag hundreds or thousands of CSV rows concurrently - **Multiple LLM Providers**: Support for Claude (Anthropic), OpenAI, Gemini, and Groq - **Structured Output**: Uses Pydantic models for consistent, type-safe results - **Flexible Taxonomy**: Define custom tag lists for your use case - **Optional Reasoning**: Include confidence levels and explanations for tags ## Installation ### Prerequisites - Python 3.12+ - UV package manager - API key for at least one LLM provider ### Environment Setup 1. Clone this repository 2. Create a `.env` file with your API keys: ```bash ANTHROPIC_API_KEY=your_key_here OPENAI_API_KEY=your_key_here GEMINI_API_KEY=your_key_here GROQ_API_KEY=your_key_here ``` ## Claude Desktop Configuration ### Option 1: Local Development (Recommended) Run directly without containers: ```json { "mcpServers": { "tagging-mcp": { "command": "uv", "args": ["run", "fastmcp", "run", "/path/to/tagging_mcp/tagging.py"] } } } ``` ### Option 2: Container Deployment 1. Build the container: ```bash container build -t tagging_mcp . ``` 2. Configure Claude Desktop: ```json { "mcpServers": { "tagging-mcp": { "command": "container", "args": ["run", "--interactive", "tagging_mcp"] } } } ``` ## Available Tools ### `tag_csv` Simple tagging with a list of categories. Perfect for basic classification tasks. **Parameters:** - `csv_path` (str): Path to the CSV file to tag - `taxonomy` (List[str]): List of possible tags/categories (e.g., ["technology", "business", "science"]) - `text_column` (str, optional): Column containing text to analyze (default: "text") - `provider` (str, optional): LLM provider - "claude", "openai", "gemini", "groq", or "bedrock" (default: "groq") - `model` (str, optional): Model identifier (default: "llama-3.3-70b-versatile") - `api_key` (str, optional): API key if not set via environment variable - `output_path` (str, optional): Path to save tagged CSV - `include_reasoning` (bool, optional): Include detailed reasoning and reflection (default: false) - `field_name` (str, optional): Name for the classification field (default: "category") **Returns:** Dictionary with status, tagged data preview, confidence scores, and optional errors ### `tag_csv_advanced` Advanced multi-dimensional classification with custom taxonomy definitions. Use this for complex tagging with multiple fields. **Parameters:** - `csv_path` (str): Path to the CSV file to tag - `taxonomy` (Dict): Full taxonomy dictionary with field definitions and value descriptions - `text_column` (str, optional): Column containing text to analyze (default: "text") - `provider` (str, optional): LLM provider (default: "groq") - `model` (str, optional): Model identifier (default: "llama-3.3-70b-versatile") - `api_key` (str, optional): API key if not set via environment variable - `output_path` (str, optional): Path to save tagged CSV - `include_reasoning` (bool, optional): Include detailed reasoning (default: false) **Example Taxonomy:** ```json { "sentiment": { "description": "The emotional tone of the text", "values": { "positive": "Text expresses positive emotions or favorable opinions", "negative": "Text expresses negative emotions or unfavorable opinions", "neutral": "Text is factual and objective" } }, "urgency": { "description": "How urgent the content is", "values": { "high": "Requires immediate attention", "medium": "Should be addressed soon", "low": "Can be addressed at any time" } } } ``` **Returns:** Dictionary with status, all field values, confidence scores per field, and optional reasoning ### `preview_csv` Preview the first few rows of a CSV file to understand its structure. **Parameters:** - `csv_path` (str): Path to the CSV file - `rows` (int, optional): Number of rows to preview (default: 5) **Returns:** Dictionary with columns, row count, and preview data ### `get_tagging_info` Get information about the tagging MCP server and supported providers. **Returns:** Server metadata, supported providers, features, and available tools ## Example Usage ### Basic Tagging 1. **Preview your CSV:** ``` Use preview_csv with csv_path="/path/to/data.csv" ``` 2. **Simple category tagging:** ``` Use tag_csv with: - csv_path="/path/to/data.csv" - taxonomy=["technology", "business", "science", "politics"] - text_column="description" - output_path="/path/to/tagged_output.csv" ``` 3. **Include reasoning for transparency:** ``` Use tag_csv with: - csv_path="/path/to/data.csv" - taxonomy=["urgent", "normal", "low_priority"] - field_name="priority" - include_reasoning=true ``` ### Advanced Multi-Field Tagging For complex classification with multiple dimensions: ``` Use tag_csv_advanced with: - csv_path="/path/to/support_tickets.csv" - taxonomy={ "department": { "description": "Which department should handle this", "values": { "sales": "Product inquiries and purchases", "support": "Technical issues and bugs", "billing": "Payment and account questions" } }, "priority": { "description": "How urgent this is", "values": { "urgent": "Service down or critical issue", "high": "Significant problem", "normal": "Standard request" } } } - text_column="ticket_description" - output_path="/path/to/classified_tickets.csv" ``` ## Output Structure ### Basic Tagging Output - Original CSV columns - `{field_name}`: The selected tag - `confidence`: Confidence score (0.0 to 1.0) - `thinking`: Reasoning for each possible value (if `include_reasoning=true`) - `reflection`: Overall analysis (if `include_reasoning=true`) ### Advanced Tagging Output - Original CSV columns - For each taxonomy field: - `{field_name}`: Selected value - `{field_name}_confidence`: Confidence score - `{field_name}_thinking`: Reasoning dict (if enabled) - `{field_name}_reflection`: Analysis (if enabled) ## Supported LLM Providers - **Groq** (Recommended): llama-3.3-70b-versatile, llama-3.1-70b-versatile, mixtral-8x7b-32768 - **Claude (Anthropic)**: claude-3-5-sonnet-20241022, claude-3-opus-20240229 - **OpenAI**: gpt-4, gpt-4-turbo, gpt-3.5-turbo - **Gemini**: gemini-1.5-pro, gemini-1.5-flash - **AWS Bedrock**: anthropic.claude-3-sonnet, anthropic.claude-3-haiku ## Key Features ✨ **Detailed Reasoning**: For each tag, see why the model chose it 🔍 **Reflection**: Model reflects on its analysis 📊 **Confidence Scores**: Know how confident each classification is (0.0-1.0) ⚡ **Parallel Processing**: All rows processed concurrently 🎯 **Error Detection**: Automatic error tracking and reporting 🔧 **Flexible**: Simple list or complex multi-field taxonomies ## License MIT

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/daviddrummond95/tagging_mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•7.11 KiB