Skip to main content
Glama

VayuChat MCP

Natural language data analysis for air quality data using MCP (Model Context Protocol).

Features

Pre-loaded Datasets

  • air_quality: Hourly PM2.5, PM10, NO2, SO2, CO, O3 readings for Delhi & Bangalore

  • funding: Government air quality funding by city/year (2020-2024)

  • city_info: City metadata - population, vehicles, industries, green cover

Analysis Tools (No Code Required!)

Function

Description

list_tables

Show available tables

show_table

Display table data

describe_table

Detailed statistics

query_table

Filter with pandas query

compare_weekday_weekend

Weekday vs weekend analysis

compare_cities

Compare metrics across cities

analyze_correlation

Correlation analysis

analyze_funding

Funding breakdown

get_city_profile

Comprehensive city profile

Visualization Tools

Function

Description

plot_comparison

Bar/box charts

plot_time_series

Time series charts

plot_weekday_weekend

Weekday vs weekend bars

plot_funding_trend

Funding over years

plot_hourly_pattern

Hourly patterns

Installation

# Using uv uv pip install -e . # Or with pip pip install -e .

Usage

As MCP Server (with Claude Code)

Add to your Claude Code MCP configuration:

{ "mcpServers": { "vayuchat": { "command": "uv", "args": ["run", "--directory", "/path/to/vayuchat-mcp", "vayuchat-mcp"] } } }

As Gradio App (HF Spaces)

# Run locally python app.py # Or with gradio gradio app.py

Then open http://localhost:7860

Deploy to Hugging Face Spaces

  1. Create a new Space on HF (Gradio SDK)

  2. Upload these files:

    • app.py

    • requirements.txt

    • src/ folder

    • data/ folder

Or connect your GitHub repo directly to HF Spaces.

Example Queries

# Data exploration "What tables are available?" "Show me the funding table" "Describe the air quality data" # Analysis "Compare weekday vs weekend PM2.5" "Compare cities by PM10 levels" "Get Delhi city profile" "Show correlation with PM2.5" # Funding "Show funding for Delhi" "What's the funding trend?" # Visualizations "Plot weekday vs weekend PM2.5" "Show hourly pattern for NO2" "Plot funding trend chart"

Architecture

NLQ (User Question) ↓ Gradio Chat UI ↓ Query Router (keyword-based / LLM) ↓ MCP Tool Call ↓ Response (Markdown + Base64 Plot) ↓ Rendered in UI

Why Predefined Functions vs LLM-Generated Code?

This project uses predefined MCP functions instead of letting the LLM generate arbitrary pandas/matplotlib code. Here's why:

Comparison Table

Aspect

Predefined Functions (This Approach)

LLM-Generated Code

Function-Calling LLM

Reliability

✅ Deterministic, always works

❌ May hallucinate syntax

⚠️ Better but can miss params

Speed

✅ Instant (no code generation)

❌ Slow (generate → parse → execute)

⚠️ Moderate

Cost

✅ Minimal tokens

❌ Long prompts with schema

⚠️ Moderate

Security

✅ No arbitrary code execution

❌ Code injection risk

✅ Safe

Consistency

✅ Same visualization style

❌ Random styling each time

✅ Consistent

Model Size

✅ Works with small/cheap models

❌ Needs capable coder model

⚠️ Needs fine-tuned model

Flexibility

❌ Limited to predefined queries

✅ Infinite flexibility

⚠️ Limited to defined functions

Error Handling

✅ Graceful, predictable

❌ May crash, retry loops

✅ Structured errors

When to Use Each Approach

Use Predefined Functions (this approach) when:

  • You have a known, bounded set of analysis patterns

  • Users are non-technical (need consistent UX)

  • Cost/latency matters (production deployment)

  • You want guaranteed correct outputs

  • Using smaller/cheaper models (Haiku, GPT-3.5)

Use LLM-Generated Code when:

  • Exploratory data analysis with unknown patterns

  • Power users who can debug code

  • One-off analyses

  • Prototype/research phase

Use Function-Calling LLM when:

  • You have predefined functions BUT need better intent parsing

  • Using OpenAI/Claude with native function calling

  • Queries are ambiguous and need sophisticated NLU

The Hybrid Approach (Best of Both)

User Query ↓ ┌─────────────────────────────────────┐ │ LLM with Function Calling │ ← Parses intent, extracts params │ (Claude, GPT-4, etc.) │ └─────────────────────────────────────┘ ↓ ┌─────────────────────────────────────┐ │ MCP Predefined Functions │ ← Executes reliably │ (compare_cities, plot_trend, etc.) │ └─────────────────────────────────────┘ ↓ Structured Response + Plot

This gives you:

  • LLM's NLU capabilities for parsing complex queries

  • Predefined functions' reliability for execution

  • No code hallucination risk

  • Consistent outputs every time

Example: Same Query, Different Approaches

Query: "Compare PM2.5 on weekdays vs weekends for Delhi and Bangalore"

LLM-Generated Code (risky):

# LLM might generate: df['is_weekend'] = df['day'].isin(['Sat', 'Sun']) # Wrong column name! df.groupby(['city', 'is_weekend'])['pm25'].mean() # Wrong column name! # ... errors, retries, inconsistent output

Predefined Function (reliable):

# MCP calls: compare_weekday_weekend(value_column="PM2.5", group_by="city") # Always works, consistent format, proper column names

Cost Comparison (Approximate)

Approach

Tokens per Query

Cost (GPT-4)

Latency

Predefined + Keyword Router

~100

$0.001

<100ms

Predefined + LLM Router

~500

$0.005

~500ms

LLM-Generated Code

~2000+

$0.02+

2-5s

For 1000 queries/day:

  • Predefined: ~$1-5/day

  • LLM Code Gen: ~$20+/day

Data Sources

  • Air quality data: Simulated based on real patterns from Indian cities

  • Funding data: Mock data representing typical government allocations

  • City info: Approximate real statistics

License

MIT

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/nipunbatra/vayuchat-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server