# Generic Data Analytics MCP Server
A MCP (Model Context Protocol) server that transforms any structured dataset (JSON/CSV) into intelligent, AI-guided analytics workflows. This server demonstrates advanced modular architecture with **dataset-agnostic design** - it automatically adapts to ANY data without hardcoded schemas.
## ๐ Quick Setup
1. **Configure for your MCP client**:
```bash
cp .mcp.json.sample .mcp.json
# Edit .mcp.json and update paths to your system
```
2. **Find your UV path and update configuration**:
```bash
which uv
# Example output: /Users/yourusername/.local/bin/uv
pwd
# Example output: /Users/yourusername/path/to/quick-data-mcp
```
3. **Test the server**:
```bash
uv run python main.py
```
## ๐ Getting Started in Claude Code
Once your MCP server is configured and running, **start with this slash command in Claude Code to get oriented**:
```
/quick-data:list_mcp_assets_prompt
```
This will show you all available tools, resources, and prompts with descriptions - your complete toolkit for data analytics!
## ๐ What Makes This Special
### Universal Data Analytics
- **Works with ANY JSON/CSV dataset** - no schema definition required
- **Automatic column type detection** - numerical, categorical, temporal, identifier
- **AI-powered analysis suggestions** - recommends analyses based on your data characteristics
- **Adaptive conversation prompts** - guides users through analytics workflows using actual column names
### Tested Architecture
- **32 Analytics Tools** (20 analytics + 12 resource mirrors) for comprehensive data analysis
- **12 Dynamic Resources** providing real-time data context
- **7 Adaptive Prompts** for AI-guided exploration
- **100% Test Coverage** (103 tests passing)
- **Universal MCP Client Compatibility** (supports tool-only clients)
- **Memory optimization** with usage monitoring
## ๐ Complete Capabilities
### ๐ง Analytics Tools (32 total)
#### **Data Loading & Management**
- `load_dataset(file_path, dataset_name, sample_size?)` - Load any JSON/CSV with automatic schema discovery
- `list_loaded_datasets()` - Show all datasets currently in memory with statistics
- `clear_dataset(dataset_name)` - Remove specific dataset from memory
- `clear_all_datasets()` - Clear all datasets from memory
- `get_dataset_info(dataset_name)` - Get comprehensive dataset information
#### **Core Analytics**
- `segment_by_column(dataset_name, column_name, method?, top_n?)` - Generic segmentation on any categorical column
- `find_correlations(dataset_name, columns?, threshold?)` - Correlation analysis with configurable thresholds
- `analyze_distributions(dataset_name, column_name)` - Statistical distribution analysis for any column
- `detect_outliers(dataset_name, columns?, method)` - Outlier detection (IQR, Z-score methods)
- `time_series_analysis(dataset_name, date_column, value_column, frequency?)` - Temporal analysis with trend detection
#### **Advanced Analytics**
- `validate_data_quality(dataset_name)` - Comprehensive data quality assessment (0-100 scoring)
- `compare_datasets(dataset_a, dataset_b, common_columns?)` - Multi-dataset comparison analysis
- `merge_datasets(dataset_configs, join_strategy?)` - Join datasets with flexible strategies
- `calculate_feature_importance(dataset_name, target_column, feature_columns?)` - ML feature importance
- `memory_optimization_report(dataset_name)` - Performance analysis and optimization suggestions
#### **Visualization & Export**
- `create_chart(dataset_name, chart_type, x_column, y_column?, groupby_column?, title?, save_path?)` - Generate charts (bar, scatter, histogram, line, box)
- `generate_dashboard(dataset_name, chart_configs)` - Multi-chart interactive dashboards
- `export_insights(dataset_name, format?, include_charts?)` - Export in JSON, CSV, HTML formats
#### **AI-Powered Assistance**
- `suggest_analysis(dataset_name)` - AI recommendations based on data characteristics
- `execute_custom_analytics_code(dataset_name, python_code)` - Execute custom Python code against datasets with full pandas/numpy/plotly support
#### **๐ Resource Mirror Tools** (Tool-Only Client Support)
*For MCP clients that don't support resources, all resource functionality is available through mirror tools:*
**Dataset Context Tools (4)**
- `resource_datasets_loaded()` - List all loaded datasets (mirrors `datasets://loaded`)
- `resource_datasets_schema(dataset_name)` - Get dataset schema (mirrors `datasets://{name}/schema`)
- `resource_datasets_summary(dataset_name)` - Statistical summary (mirrors `datasets://{name}/summary`)
- `resource_datasets_sample(dataset_name)` - Sample data rows (mirrors `datasets://{name}/sample`)
**Analytics Intelligence Tools (5)**
- `resource_analytics_current_dataset()` - Currently active dataset (mirrors `analytics://current_dataset`)
- `resource_analytics_available_analyses()` - Applicable analysis types (mirrors `analytics://available_analyses`)
- `resource_analytics_column_types()` - Column classifications (mirrors `analytics://column_types`)
- `resource_analytics_suggested_insights()` - AI recommendations (mirrors `analytics://suggested_insights`)
- `resource_analytics_memory_usage()` - Memory monitoring (mirrors `analytics://memory_usage`)
**System Tools (3)**
- `resource_config_server()` - Server configuration (mirrors `config://server`)
- `resource_users_profile(user_id)` - User profile access (mirrors `users://{user_id}/profile`)
- `resource_system_status()` - System health info (mirrors `system://status`)
### ๐ Dynamic Resources (12 total)
#### **Dataset Context Resources**
- `datasets://loaded` - Real-time inventory of all loaded datasets
- `datasets://{dataset_name}/schema` - Dynamic schema with column classification
- `datasets://{dataset_name}/summary` - Statistical summary (pandas.describe() equivalent)
- `datasets://{dataset_name}/sample` - Sample rows for data preview
#### **Analytics Intelligence Resources**
- `analytics://current_dataset` - Currently active dataset context
- `analytics://available_analyses` - Applicable analysis types for current data
- `analytics://column_types` - Column role classification (numerical, categorical, temporal, identifier)
- `analytics://suggested_insights` - AI-generated analysis recommendations
- `analytics://memory_usage` - Real-time memory monitoring
#### **System Resources** (Legacy Compatibility)
- `config://server` - Server configuration information
- `users://{user_id}/profile` - User profile access by ID
- `system://status` - System health and status information
### ๐ฌ Adaptive Prompts (7 total)
#### **Data Exploration Prompts**
- `dataset_first_look(dataset_name)` - Personalized initial exploration guide based on actual data structure
- `segmentation_workshop(dataset_name)` - Interactive segmentation strategy using real column names
- `data_quality_assessment(dataset_name)` - Systematic quality review with specific recommendations
#### **Analysis Workflow Prompts**
- `correlation_investigation(dataset_name)` - Guided correlation analysis workflow
- `pattern_discovery_session(dataset_name)` - Open-ended pattern mining conversation
#### **Business Intelligence Prompts**
- `insight_generation_workshop(dataset_name, business_context?)` - Business insight generation with domain context
- `dashboard_design_consultation(dataset_name, audience?)` - Audience-specific dashboard planning
## ๐๏ธ Project Structure
```
quick-data-mcp/
โโโ .mcp.json # Ready-to-use MCP client configuration
โโโ data/ # Sample datasets
โ โโโ ecommerce_orders.json # E-commerce transaction data
โ โโโ employee_survey.csv # HR analytics dataset
โ โโโ product_performance.csv # Product metrics dataset
โ โโโ README.md # Data documentation
โโโ src/mcp_server/ # Core server implementation
โ โโโ server.py # Main server with 31 tools, 12 resources, 7 prompts
โ โโโ tools/ # Tool implementations
โ โ โโโ pandas_tools.py # Pandas-based tools grouped module
โ โ โโโ __init__.py # All tools (32 total)
โ โ โโโ [individual_tool_files.py] # Individual tool implementations
โ โโโ resources/ # Resource handlers
โ โ โโโ data_resources.py # Dynamic data access (12 resources)
โ โโโ prompts/ # Conversation starters
โ โ โโโ __init__.py # All prompts (9 total)
โ โ โโโ [individual_prompt_files.py] # Individual prompt implementations
โ โโโ models/ # Data models and schemas
โ โ โโโ schemas.py # DatasetManager, ColumnInfo, DatasetSchema
โ โโโ config/ # Configuration
โ โโโ settings.py # Server settings
โโโ tests/ # Comprehensive test suite (130 tests)
โ โโโ test_pandas_tools.py # Pandas tools tests
โ โโโ test_analytics_tools.py # Advanced tools tests
โ โโโ test_analytics_prompts.py # Prompts functionality tests
โ โโโ test_data_resources.py # Resource access tests
โ โโโ test_resource_mirror_tools.py # Resource mirror tool tests
โ โโโ test_custom_analytics_code.py # Custom code execution tests
โโโ outputs/ # Generated files (excluded from git)
โ โโโ charts/ # Generated HTML charts and dashboards
โ โโโ reports/ # Exported insights and reports
โโโ main.py # Entry point
```
## ๐ฆ Dependencies
### Core Analytics Stack
- `mcp[cli]>=1.9.2` - Official MCP Python SDK
- `pandas>=2.2.3` - Data manipulation and analysis
- `plotly>=6.1.2` - Interactive visualizations
### Testing & Development
- `pytest>=8.3.5` - Testing framework
- `pytest-asyncio>=1.0.0` - Async testing support
## ๐ Usage
### MCP Client Integration
Once configured, your MCP client can access all **32 tools**, **12 resources**, and **9 prompts** for comprehensive data analytics.
### Example Analytics Workflow
```python
# 1. Load any dataset
await load_dataset("data/ecommerce_orders.json", "sales")
# 2. Get AI-powered first look guidance
await dataset_first_look("sales")
# โ Returns personalized exploration guide with actual column names
# 3. Automatic analysis suggestions
await suggest_analysis("sales")
# โ AI recommends: correlation_analysis, segmentation_analysis based on detected columns
# 4. Perform suggested analyses
await find_correlations("sales")
# โ Finds relationships between numerical columns
await segment_by_column("sales", "customer_segment")
# โ Groups data and calculates statistics automatically
# 5. Create adaptive visualizations
await create_chart("sales", "bar", "region", "order_value")
# โ Generates interactive plotly charts
# 6. Comprehensive data quality assessment
await validate_data_quality("sales")
# โ Returns 0-100 quality score with detailed recommendations
```
### Advanced Multi-Dataset Analysis
```python
# Load multiple datasets
await load_dataset("data/employee_survey.csv", "hr")
await load_dataset("data/product_performance.csv", "products")
# Compare datasets
await compare_datasets("sales", "products", ["category"])
# Generate business insights
await insight_generation_workshop("sales", "e-commerce")
# Create executive dashboard
await dashboard_design_consultation("hr", "executive")
```
### ๐ฅ Custom Analytics Code Execution
Execute any Python code against your datasets with full pandas/numpy/plotly support:
```python
# Custom analysis that goes beyond predefined tools
output = await execute_custom_analytics_code("sales", """
print("=== Custom Customer Segmentation ===")
# Advanced customer scoring algorithm
customer_scores = df.groupby('customer_id').agg({
'order_value': ['sum', 'mean', 'count'],
'date': ['min', 'max']
}).round(2)
# Flatten column names
customer_scores.columns = ['total_spent', 'avg_order', 'order_count', 'first_order', 'last_order']
# Calculate customer lifetime (days)
customer_scores['lifetime_days'] = (
pd.to_datetime(customer_scores['last_order']) -
pd.to_datetime(customer_scores['first_order'])
).dt.days
# Custom scoring formula
customer_scores['loyalty_score'] = (
customer_scores['total_spent'] * 0.4 +
customer_scores['order_count'] * 50 +
customer_scores['lifetime_days'] * 0.1
).round(1)
# Segment customers
def segment_customer(score):
if score >= 1000: return 'VIP'
elif score >= 500: return 'Gold'
elif score >= 200: return 'Silver'
else: return 'Bronze'
customer_scores['segment'] = customer_scores['loyalty_score'].apply(segment_customer)
print("Customer Segments:")
print(customer_scores['segment'].value_counts())
print("\\nTop 5 Customers:")
top_customers = customer_scores.sort_values('loyalty_score', ascending=False).head()
for idx, (customer_id, data) in enumerate(top_customers.iterrows(), 1):
print(f"{idx}. {customer_id}: {data['segment']} (Score: {data['loyalty_score']})")
""")
# Agents can iterate on code based on output
if "ERROR:" in output:
# Fix the code and try again
pass
else:
print("Analysis completed successfully!")
```
### ๐ Resource Mirror Tools Usage (Tool-Only Clients)
For MCP clients that don't support resources, use the resource mirror tools for identical functionality:
```python
# Instead of accessing resource: datasets://loaded
datasets = await resource_datasets_loaded()
# โ Returns: {"datasets": [...], "total_datasets": 2, "status": "loaded"}
# Instead of accessing resource: datasets://sales/schema
schema = await resource_datasets_schema("sales")
# โ Returns: {"dataset_name": "sales", "columns_by_type": {...}}
# Instead of accessing resource: analytics://memory_usage
memory = await resource_analytics_memory_usage()
# โ Returns: {"datasets": [...], "total_memory_mb": 15.2}
# Instead of accessing resource: config://server
config = await resource_config_server()
# โ Returns: {"name": "Generic Data Analytics MCP", "features": [...]}
# All 12 resource mirror tools provide identical data to their resource counterparts
# Perfect for tool-only MCP clients or when resource support is unavailable
```
## ๐งช Testing
```bash
# Run all 130 tests
uv run python -m pytest tests/ -v
# Test specific functionality
uv run python -m pytest tests/test_pandas_tools.py -v # Pandas tools
uv run python -m pytest tests/test_analytics_tools.py -v # Advanced tools
uv run python -m pytest tests/test_analytics_prompts.py -v # Prompts functionality
uv run python -m pytest tests/test_resource_mirror_tools.py -v # Resource mirror tools
uv run python -m pytest tests/test_custom_analytics_code.py -v # Custom code execution
# Quick test run
uv run python -m pytest tests/ -q
# Expected: 130 passed
```
## ๐ง MCP Client Configuration
### Quick Setup (Recommended)
This project includes a sample configuration that you can customize:
1. **Copy the sample configuration**:
```bash
cp .mcp.json.sample .mcp.json
```
2. **Update paths in `.mcp.json`** to match your system:
```json
{
"mcpServers": {
"quick-data": {
"command": "/path/to/uv",
"args": [
"--directory",
"/path/to/your/quick-data-mcp",
"run",
"python",
"main.py"
],
"env": {
"LOG_LEVEL": "INFO"
}
}
}
}
```
3. **Find your UV path**:
```bash
which uv
# Example output: /Users/yourusername/.local/bin/uv
```
4. **Get absolute path to this directory**:
```bash
pwd
# Example output: /Users/yourusername/path/to/quick-data-mcp
```
5. **Update `.mcp.json`** with your actual paths:
- Replace `/path/to/uv` with your UV path
- Replace `/path/to/your/quick-data-mcp` with your absolute directory path
6. **Copy to your MCP client** or reference directly if supported
### Option 2: Manual Configuration
If you prefer to configure manually, add to your MCP client configuration:
```json
{
"mcpServers": {
"quick-data": {
"command": "/path/to/uv",
"args": [
"--directory",
"/absolute/path/to/quick-data-mcp",
"run",
"python",
"main.py"
],
"env": {
"LOG_LEVEL": "INFO"
}
}
}
}
```
**Important**: Replace the placeholder paths with your actual system paths.
### Configuration Notes
- **Use absolute paths** for reliability across different working directories
- **`--directory` flag** ensures UV operates in the correct project directory
- **`.mcp.json` is gitignored** - each user needs their own copy with local paths
- **Use `.mcp.json.sample`** as a template to avoid path conflicts
- **Environment variables** can be customized per deployment
### Environment Variables
- `LOG_LEVEL` - Logging level (default: INFO)
- `SERVER_NAME` - Server name (default: "Generic Data Analytics MCP")
## ๐ Getting Started in Claude Code
Once your MCP server is configured and running, **start with this slash command in Claude Code to get oriented**:
```
/quick-data:list_mcp_assets_prompt
```
This will show you all available tools, resources, and prompts with descriptions - your complete toolkit for data analytics!
## ๐ก Sample Datasets Included
### E-commerce Orders (`data/ecommerce_orders.json`)
- **15 orders** with customer segments, regions, product categories
- **Use cases**: Revenue analysis, customer segmentation, regional performance
### Employee Survey (`data/employee_survey.csv`)
- **25 employees** with satisfaction scores, departments, tenure
- **Use cases**: HR analytics, satisfaction analysis, department comparisons
### Product Performance (`data/product_performance.csv`)
- **20 products** with sales, suppliers, ratings, launch dates
- **Use cases**: Product analysis, supplier performance, market trends
## ๐ฏ Architecture Benefits
### Dataset Agnosticism
- **Works with ANY structured data** - no hardcoded schemas required
- **Intelligent column detection** - automatically classifies data types
- **Zero configuration** - drop in data files and start analyzing immediately
### Modular Excellence
- **Clean separation** - tools, resources, prompts, and models organized logically
- **Independent testing** - each component tested in isolation
- **Easy extension** - add new analytics without affecting existing functionality
### Production Ready
- **Comprehensive error handling** - graceful failures with actionable messages
- **Memory optimization** - efficient pandas operations with usage monitoring
- **Performance monitoring** - built-in analytics for large datasets
### AI Integration
- **Smart recommendations** - analysis suggestions based on data characteristics
- **Context-aware prompts** - conversations that reference real column names
- **Adaptive workflows** - tools that adjust behavior based on data types
## ๐ฎ Extension Examples
### Adding Custom Analytics
```python
# Add to tools/__init__.py or individual tool file
@staticmethod
async def custom_analysis(dataset_name: str, parameters: dict) -> dict:
"""Your custom analysis function."""
df = DatasetManager.get_dataset(dataset_name)
# Your analysis logic here
return {"analysis": "results"}
# Register in server.py
@mcp.tool()
async def custom_analysis(dataset_name: str, parameters: dict) -> dict:
return await tools.custom_analysis(dataset_name, parameters)
```
### Adding Domain-Specific Prompts
```python
# Add to prompts/__init__.py
@staticmethod
async def financial_analysis_workshop(dataset_name: str) -> str:
"""Guide financial analysis workflows."""
# Custom financial analysis guidance
return prompt_text
# Register in server.py
@mcp.prompt()
async def financial_analysis_workshop(dataset_name: str) -> str:
return await prompts.financial_analysis_workshop(dataset_name)
```
## ๐ Success Metrics
- โ
**Comprehensive Test Coverage** - 130 tests passing
- โ
**Universal Data Compatibility** - Works with any JSON/CSV structure
- โ
**Universal MCP Client Compatibility** - Supports both resource-enabled and tool-only clients
- โ
**Custom Code Execution** - Full Python analytics capabilities with pandas/numpy/plotly
- โ
**AI Integration** - Smart recommendations and adaptive conversations
- โ
**Performance Optimized** - Memory-efficient operations with monitoring
This MCP server transforms the concept of data analytics from rigid, schema-dependent tools into a flexible, AI-guided platform that adapts to any dataset while providing expert-level guidance through conversational interfaces.