We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/neuhausi/canvasxpress-mcp-server-main'
If you have feedback or need assistance with the MCP directory API, please join our Discord server
# Technical Overview
Developer reference for extending the CanvasXpress MCP Server.
---
## Architecture
```
┌──────────────────┐ ┌─────────────────────┐ ┌─────────────────┐
│ MCP Client │────▶│ mcp_server.py │────▶│ Generator │
│ (Claude, CLI) │ │ (FastMCP 2.0) │ │ (RAG Pipeline) │
└──────────────────┘ └─────────────────────┘ └────────┬────────┘
│
┌────────────────────────────────────┼────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ Vector DB │ │ Embeddings │ │ LLM Provider │
│ (Milvus) │ │ (Pluggable) │ │ (Pluggable) │
│ 132 examples│ │ │ │ │
└─────────────┘ └──────────────┘ └──────────────┘
BGE-M3 (local) Azure OpenAI
OpenAI API Google Gemini
Gemini API
```
### Provider Support
The server supports multiple LLM and embedding providers via environment variables:
| Component | Options | Default | Env Variable |
|-----------|---------|---------|--------------|
| **LLM** | Azure OpenAI, Google Gemini | `openai` | `LLM_PROVIDER` |
| **Embeddings** | BGE-M3 (local), Azure OpenAI, Google Gemini | `local` | `EMBEDDING_PROVIDER` |
See `.env.example` for full configuration options.
---
## Directory Structure
```
├── src/
│ ├── mcp_server.py # MCP server entry point (FastMCP)
│ └── canvasxpress_generator.py # Core RAG logic + provider classes
├── scripts/
│ └── init_vector_db.py # Local venv DB initialization script
├── data/
│ ├── few_shot_examples.json # 132 examples (input to vector DB)
│ ├── prompt_template.md # LLM prompt template
│ └── schema.md # CanvasXpress config schema
├── vector_db/ # Created by `make init` or `make init-local`
│ └── canvasxpress_mcp.db # Milvus database
├── venv/ # Created by `make venv` (local dev only)
├── mcp_cli.py # CLI client for testing
├── mcp_http_client.py # HTTP client example
├── test_vector_db.py # Vector DB test utility
└── examples_usage.py # Python API usage examples
```
---
## Core Classes & Methods
### `EmbeddingProvider` (src/canvasxpress_generator.py)
Abstract embedding provider supporting multiple backends.
| Method | Purpose |
|--------|---------|
| `__init__(provider)` | Initialize with "local", "openai", or "gemini" |
| `encode(texts)` | Batch encode texts to embeddings |
| `encode_query(text)` | Encode single query text |
**Dimensions by provider:**
- `local` (BGE-M3): 1024
- `openai` (text-embedding-3-small): 1536
- `gemini` (text-embedding-004): 768
### `LLMProvider` (src/canvasxpress_generator.py)
Abstract LLM provider supporting multiple backends.
| Method | Purpose |
|--------|---------|
| `__init__(provider, **kwargs)` | Initialize with "openai" or "gemini" |
| `generate(prompt, temperature, max_retries)` | Generate text from prompt |
### `CanvasXpressGenerator` (src/canvasxpress_generator.py)
Main class that implements the RAG pipeline.
| Method | Purpose |
|--------|---------|
| `__init__(data_dir, vector_db_path, llm_model, llm_environment)` | Initialize generator, load examples, setup vector DB |
| `get_similar_examples(description, num_examples=25)` | Semantic search for similar few-shot examples |
| `build_prompt(description, headers, similar_examples)` | Construct LLM prompt with retrieved examples |
| `generate(description, headers, temperature)` | **Main entry point** - returns CanvasXpress JSON config |
### `mcp_server.py`
FastMCP server that exposes one tool:
```python
@mcp.tool()
def generate_canvasxpress_config(
description: str, # Natural language chart description
headers: str = None, # Optional column names
temperature: float = 0.0
) -> str: # JSON response (see below)
```
**Response Format:**
```json
{
"success": true,
"description": "original user description",
"headers": "original headers or null",
"config": {"graphType": "Bar", ...},
"error": null
}
```
On error:
```json
{
"success": false,
"description": "...",
"headers": "...",
"config": null,
"error": "Error message describing what went wrong"
}
```
---
## Data Flow
1. **User Request** → `"Create a bar chart with blue bars"`
2. **Embed Query** → BGE-M3 converts to 1024-dim vector
3. **Vector Search** → Milvus returns top 25 similar examples
4. **Build Prompt** → Template + schema + few-shot examples + user query
5. **LLM Call** → Azure OpenAI generates JSON config
6. **Return** → `{"graphType": "Bar", "colors": ["blue"]}`
---
## Key Files Detail
### `data/few_shot_examples.json`
```json
[
{
"id": 0,
"type": "Area",
"description": "Area graph of hwy with title...",
"config": {"graphType": "Area", "xAxis": ["hwy"], ...},
"headers": "Id,class,cty,cyl,...",
"source": "human" // or "gpt4"
},
...
]
```
- **Default**: 66 examples (original JOSS publication set)
- **Expanded**: `few_shot_examples_full.json` - 3,366 examples with ~13K descriptions (see README for switching)
- `source`: "human" = human-written, "gpt4" = GPT-4 generated description
### `data/prompt_template.md`
Template with placeholders:
- `{canvasxpress_config_english}` - User's description
- `{headers_column_names}` - Optional headers
- `{schema_info}` - CanvasXpress schema
- `{few_shot_examples}` - Retrieved examples
### `data/schema.md`
CanvasXpress configuration schema documentation (properties, types, valid values).
---
## Environment Variables
| Variable | Purpose | Example |
|----------|---------|---------|
| `AZURE_OPENAI_KEY` | BMS Azure API key | `aaa15abe...` |
| `LLM_MODEL` | Model name | `gpt-4o-global` |
| `LLM_ENVIRONMENT` | BMS environment | `nonprod` or `prod` |
| `MCP_TRANSPORT` | Server mode | `http` or `stdio` |
| `MCP_PORT` | HTTP port | `8000` |
---
## Deployment Options
The server supports two deployment methods:
### Option 1: Docker (Recommended for Production)
```bash
make build # Build Docker image (~8GB due to PyTorch)
make init # Initialize vector DB
make run-http # Start server (daemon mode)
```
- **Pros**: Isolated, reproducible, no Python version conflicts
- **Cons**: Large image size, requires Docker
### Option 2: Local Virtual Environment - Full (Development)
```bash
make venv # Create venv with Python 3.12 (~8GB with PyTorch)
make init-local # Initialize local vector DB
make run-local # Start server (HTTP mode)
```
- **Pros**: Faster iteration, no Docker required
- **Cons**: Requires Python 3.10+, large disk footprint (~8GB)
- **Use when**: You want local BGE-M3 embeddings (highest accuracy)
### Option 3: Local Virtual Environment - Lightweight ⭐
```bash
make venv-light # Create venv with cloud deps only (~500MB)
# Edit .env: set EMBEDDING_PROVIDER=gemini (or openai)
make init-local # Initialize local vector DB
make run-local # Start server (HTTP mode)
```
- **Pros**: Small footprint (~500MB), fast install, no PyTorch
- **Cons**: Requires cloud API for embeddings (Gemini or OpenAI)
- **Use when**: Lightweight servers, or you prefer cloud embeddings
**Path Auto-Detection**: The server (`mcp_server.py`) automatically detects which environment it's running in:
- **Docker**: Uses `/app/data` and `/root/.cache/canvasxpress_mcp.db`
- **Local**: Uses `./data` and `./vector_db/canvasxpress_mcp.db`
---
## Extending the Server
### Add More Few-Shot Examples
1. Edit `data/few_shot_examples.json` - add new entries:
```json
{
"id": 133,
"type": "NewChartType",
"description": "Natural language description...",
"config": {"graphType": "NewChartType", ...},
"headers": "Col1,Col2,Col3",
"source": "human"
}
```
2. Delete existing vector DB: `sudo rm -rf vector_db/canvasxpress_mcp.db`
3. Rebuild: `make init`
### Update Prompt Template
Edit `data/prompt_template.md` to modify how the LLM prompt is constructed.
The template uses Python format strings: `{canvasxpress_config_english}`, `{headers_column_names}`, `{schema_info}`, `{few_shot_examples}`.
### Add New Chart Types
1. Add examples to `data/few_shot_examples.json` (see above)
2. Update `data/schema.md` with new config properties/values
3. Rebuild vector DB: `make init`
### Configure Providers
The server supports multiple LLM and embedding providers. Configure via `.env`:
**LLM Providers (currently supported):**
- `openai` - Azure OpenAI via BMS Proxy (default)
- `gemini` - Google Gemini
**Embedding Providers (currently supported):**
- `local` - BGE-M3 local model (default, 1024 dimensions)
- `openai` - Azure OpenAI text-embedding-3-small (1536 dimensions)
- `gemini` - Google Gemini text-embedding-004 (768 dimensions)
Example `.env` for full Gemini setup:
```bash
LLM_PROVIDER=gemini
EMBEDDING_PROVIDER=gemini
GOOGLE_API_KEY=your_key_here
GEMINI_MODEL=gemini-2.0-flash-exp
GEMINI_EMBEDDING_MODEL=text-embedding-004
```
> **Note:** If you change `EMBEDDING_PROVIDER`, you must reinitialize the vector database since different providers have different embedding dimensions.
### Add a New LLM Provider
To add a new provider (e.g., Anthropic Claude):
1. **Add to `LLMProvider` class in `canvasxpress_generator.py`:**
```python
def _init_anthropic(self, **kwargs):
import anthropic
self.client = anthropic.Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
def _generate_anthropic(self, prompt, temperature, max_retries):
response = self.client.messages.create(...)
return response.content[0].text
```
2. **Add conditional import at module level:**
```python
if LLM_PROVIDER == "anthropic":
import anthropic
```
3. **Update `requirements.txt` (or `requirements-light.txt`):**
```
anthropic>=0.20.0
```
### Add a New MCP Tool
```python
# In src/mcp_server.py
@mcp.tool()
def my_new_tool(param1: str, param2: int = 10) -> str:
"""Tool description shown to AI assistants."""
# Implementation
return result
```
### Change Embedding Model
Edit `canvasxpress_generator.py`:
```python
self.bge_m3_ef = BGEM3FlagModel('BAAI/bge-m3', ...) # Change model here
```
Note: Update `dimension=1024` in `_setup_vector_db()` if dimensions change.
### Using Cloud Embeddings (Implemented)
Cloud embeddings are now supported as an alternative to local BGE-M3. This reduces Docker image size from ~8GB to ~1-2GB since PyTorch is not required.
**Available Providers:**
| Provider | `EMBEDDING_PROVIDER` | Dimension | Notes |
|----------|---------------------|-----------|-------|
| BGE-M3 (local) | `local` | 1024 | Default, proven accuracy, requires PyTorch |
| Azure OpenAI | `openai` | 1536 | Uses `text-embedding-3-small` |
| Google Gemini | `gemini` | 768 | Free tier: 1,500 req/min |
**To use cloud embeddings:**
1. **Set provider in `.env`:**
```bash
EMBEDDING_PROVIDER=gemini # or "openai"
GOOGLE_API_KEY=your_key # for gemini
# or AZURE_OPENAI_KEY for openai
```
2. **Delete and reinitialize vector DB:**
```bash
rm -rf vector_db
make init-local # or make init for Docker
```
3. **(Optional) Reduce Docker image size:**
If using cloud embeddings exclusively, you can comment out PyTorch dependencies in `requirements.txt`:
```
# torch>=2.0.0 # Not needed for cloud embeddings
# FlagEmbedding>=1.2.10
# sentence-transformers>=3.0.0
```
**Note:** You cannot mix embedding providers - all examples must be embedded with the same provider. Switching requires re-embedding all 132 examples.
---
## Testing
```bash
make test-db # Verify vector DB (132 examples, search test)
make run-http # Start server (daemon)
python3 mcp_cli.py -q "bar chart" # Pretty formatted output
python3 mcp_cli.py -q "bar chart" --json # Full JSON response
python3 mcp_cli.py -q "bar chart" --config-only # Just the config
make stop # Stop server
```
---
## Dependencies
Key packages (see `requirements.txt`):
- `fastmcp>=2.0.0` - MCP server framework
- `pymilvus[milvus-lite]` - Vector database
- `FlagEmbedding` - BGE-M3 embeddings
- `openai` - Azure OpenAI client