# Data Analytics MCP Toolkit
An MCP (Model Context Protocol) server that exposes data visualization and simple machine learning tools. When an external LLM calls the toolkit, it can use the high-level **run_analytics** tool to describe intent and data; the server selects and runs the appropriate pipeline (visualization or ML) and returns charts or metrics.
## Features
- **Data**: `load_data` (CSV/JSON string or URL), `clean_data` (drop NA, optional normalize)
- **Visualization**: `plot_bar`, `plot_line`, `plot_scatter`, `plot_histogram`, `plot_box`, `plot_heatmap` (return base64 PNG)
- **ML**: `train_test_split`, `train_linear_regression`, `train_logistic_regression`, `train_kmeans`, plus `evaluate_regression`, `evaluate_classification`, `evaluate_clustering`
- **Pipeline**: `run_analytics(intent, data_source)` — intent-based routing to the right pipeline
## Install
```bash
cd /path/to/trying_IBM_MCP
pip install -e .
# or
pip install -r requirements.txt
```
From the project root, ensure `src` is on `PYTHONPATH` when running the server (or install in editable mode).
## Run the MCP server
**stdio (for Cursor / IDE):**
```bash
# From project root, with src on path
PYTHONPATH=src python -m data_analytics_mcp.server
```
Or with uv:
```bash
uv run --project . python -m data_analytics_mcp.server
```
(If using a `pyproject.toml` that sets `packages` under `src`, install first with `pip install -e .` then run `python -m data_analytics_mcp.server` from the repo root.)
## Cursor MCP configuration
Add the server to Cursor (e.g. in Cursor Settings → MCP, or project `.cursor/mcp.json`):
```json
{
"mcpServers": {
"data-analytics": {
"command": "python",
"args": ["-m", "data_analytics_mcp.server"],
"cwd": "/path/to/trying_IBM_MCP",
"env": { "PYTHONPATH": "src" }
}
}
}
```
Use the full path for `cwd`. If you installed the package (`pip install -e .`), you can use:
```json
{
"mcpServers": {
"data-analytics": {
"command": "python",
"args": ["-m", "data_analytics_mcp.server"],
"cwd": "/Users/jerrychen/projects/trying_IBM_MCP"
}
}
}
```
## Usage
- **One-shot**: Call `run_analytics` with a natural-language intent (e.g. "show distribution of sales", "predict price from square_feet", "cluster into 4 groups") and the data as CSV/JSON string or URL. The server returns either a chart (base64 image) or ML metrics and a short model summary.
- **Step-by-step**: Use `load_data` → get `data_id` → then call `clean_data`, `plot_*`, or `train_test_split` → `train_*` → `evaluate_*` as needed. Use resources `analytics://pipelines` and `analytics://pipelines/visualization` (etc.) to see pipeline descriptions.
## Project layout
```
src/data_analytics_mcp/
server.py # MCP app, tools, resources
pipeline.py # Intent → pipeline; execute_pipeline
data.py # load_data, clean_data
viz.py # Plot functions → base64 PNG
ml.py # Train/evaluate regression, classification, clustering
store.py # In-memory session store
```