ai-analyst
Integrates with dbt semantic models to define business metrics and enable consistent, queryable analytics across the data pipeline.
Provides natural language querying and advanced analytics over DuckDB databases using a semantic layer, enabling users to ask questions in plain English and get statistically rigorous results.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@ai-analystHow many users do we have?"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Local-AI-Analyst
📦 Public Archive - November 2025
This project is being archived as a complete reference implementation of a local-first AI-powered analytics assistant. It demonstrates best practices for MCP integration, dbt semantic bridging, and extending data pipeline infrastructure. It was built to showcase duckDB, dlt, dbt, and Ibis (Boring Semantic Layer).
AI-powered data analyst with semantic layer, statistical rigor, and natural language insights
Status: v1.0 Production Ready | Works with Claude Desktop & ChatGPT Desktop | Quick Start →
What You Get
Natural Language Queries - "What's our conversion rate by plan type?" No SQL required.
Statistical Rigor - Automatic significance testing, confidence intervals, sample size validation
23+ Analytical Tools - Multi-query workflows, intelligent caching, conversation memory
Production Ready - Built on semantic layer principles with real data execution
# Ask Claude Desktop or ChatGPT Desktop:
"How many users do we have?"
"Compare engagement by plan type"
"Run a comprehensive conversion analysis"How it works: Claude/ChatGPT Desktop → MCP Protocol → AI Analyst → Semantic Layer → Your Data
Quick Start
Prerequisites
Claude Desktop OR ChatGPT Desktop (or both!)
Python 3.10+ (download)
5 minutes for setup
Quick Install
# 1. Install UV package manager
curl -LsSf https://astral.sh/uv/install.sh | sh
# 2. Clone and setup
git clone https://github.com/sbdk-dev/claude-analyst.git
cd claude-analyst
./scripts/setup.sh
# 3. Test the system
cd semantic-layer
uv run python test_all_functionality.py
# Expected: Tests Passed: 7 | Success Rate: 100.0%Option A: Claude Desktop Setup
# 1. Add to ~/Library/Application Support/Claude/claude_desktop_config.json:
{
"mcpServers": {
"ai-analyst": {
"command": "/opt/homebrew/bin/uv",
"args": ["run", "python", "run_mcp_server.py"],
"cwd": "/FULL/PATH/TO/claude-analyst/semantic-layer"
}
}
}
# 2. Find your uv path
which uv # Use this full path in config above
# 3. Restart Claude Desktop
# 4. Ask: "List available data models"Option B: ChatGPT Desktop Setup
# 1. Set your OpenAI API key
export OPENAI_API_KEY="sk-your-key-here"
# 2. Start the OpenAI API server
cd semantic-layer
uv run python run_openai_server.py
# Server starts on http://localhost:8000
# 3. Configure ChatGPT Desktop
# Settings → Beta Features → Actions → Add Custom Action
# URL: http://localhost:8000
# Auth: Bearer token (optional)
# 4. Start chatting!Detailed guides: QUICK_START.md | Claude Desktop Setup
Core Features
1. Execution-First (Prevents AI Fabrication)
Pattern: Build → Execute → Annotate
Every answer is based on REAL query results, not AI guesses.
Query: "What's our user breakdown by plan?"
→ Generates SQL from semantic model
→ Executes against real database
→ Interprets actual results
→ "700 Free (70%), 250 Pro (25%), 50 Enterprise (5%)"2. Statistical Testing by Default
Automatic significance testing when comparing groups:
Query: "Is the difference in engagement statistically significant?"
→ Auto-runs appropriate test (chi-square, t-test)
→ Reports p-values, effect sizes, confidence intervals
→ "Pro users 2.3x higher DAU (p<0.001, n=250 vs n=700)"3. Multi-Query Workflows
Built-in analytical workflows for comprehensive analysis:
Conversion Analysis - Funnel metrics, drop-off identification, cohort comparison
Feature Usage - Adoption rates, user segmentation, engagement patterns
Revenue Optimization - LTV analysis, churn prediction, growth opportunities
4. Intelligent Query Optimization
95% cache hit rate for repeated queries
Automatic performance learning
Parallel execution for complex workflows
Sub-100ms response times
Available Data Models
The system includes sample product analytics data:
Users (1,000 users)
Dimensions: plan_type, industry, company_size, signup_date
Metrics: total_users, conversion_rate, churn_rate
Events (34,000+ events)
Dimensions: event_type, feature_name, event_timestamp
Metrics: total_events, events_per_user, feature_adoption
Engagement
Dimensions: metric_date, cohort_month
Metrics: DAU, MAU, stickiness, retention (D1/D7/D30)
Example Queries
Basic Analytics
"How many users do we have?"
"What's our conversion rate from free to paid?"
"Show me the top 5 features by usage"
"What's our DAU trend this month?"Comparative Analysis
"Compare engagement between Pro and Free users"
"Is the difference in conversion rate statistically significant?"
"How does feature adoption vary by industry?"Workflows
"Run a comprehensive conversion analysis"
"Analyze feature usage patterns across user segments"
"What optimization opportunities do you see in our data?"Why This Matters
The Problem: AI Analysts Can Fabricate Numbers
Traditional AI analysis can make up results when writing observations before executing queries.
The Solution: Semantic Layer + Execution-First
Semantic Layer - Business metrics defined once, queried many ways
Execution-First - Always run the query before interpreting results
Statistical Rigor - Automatic validation and significance testing
Built on Research
Semantic Layer Design: Rasmus Engelbrecht's practical guide
Fabrication Prevention: Build → Execute → Annotate pattern
Production Stack: Boring Semantic Layer + Ibis + FastMCP + DuckDB
Architecture
┌─────────────────────────────────┐
│ Claude Desktop / ChatGPT │
└────────────┬────────────────────┘
│ MCP Protocol
┌────────────▼────────────────────┐
│ AI Analyst MCP Server │
│ ┌──────────────────────────┐ │
│ │ Multi-Query Workflows │ │
│ │ Query Optimization │ │
│ │ Conversation Memory │ │
│ │ Statistical Testing │ │
│ └───────────┬──────────────┘ │
│ │ │
│ ┌───────────▼──────────────┐ │
│ │ Semantic Layer │ │
│ │ (Business Metrics) │ │
│ └───────────┬──────────────┘ │
└──────────────┼──────────────────┘
│
┌─────▼──────┐
│ DuckDB │
└────────────┘Key Components:
23 MCP Tools - Complete analytical toolkit
Semantic Layer - Users, events, engagement models
Intelligence Layer - Statistical testing, natural language generation
Optimization Engine - Caching, performance learning, parallel execution
Conversation Memory - 24-hour context window, preference learning
Tech Stack
MCP Server: FastMCP - Production-grade MCP framework Semantic Layer: Boring Semantic Layer - Business metrics abstraction Query Engine: Ibis - Portable dataframe abstraction Database: DuckDB - Analytical database (prototype) Statistical Testing: scipy - Significance testing and effect sizes
Verify Installation
After setup, confirm everything works:
cd semantic-layer
# Test semantic layer
uv run python -c "
from mcp_server.semantic_layer_integration import SemanticLayerManager
import asyncio
async def test():
manager = SemanticLayerManager()
await manager.initialize()
models = await manager.get_available_models()
print(f'SUCCESS: {len(models)} models loaded')
print(f'Available: {[m[\"name\"] for m in models]}')
asyncio.run(test())
"Expected output:
SUCCESS: 3 models loaded
Available: ['users', 'events', 'engagement']If you see this, restart Claude Desktop and try: "List available data models"
Project Status
Current Version: 1.0 Production Ready Release Date: November 2025 Test Pass Rate: 100% (7/7 tests) Completion: All v1.0 features implemented and validated Performance: 95% cache hit rate, <100ms query response
Support & Contributing
Issues: Found a bug or have a feature request? Open an issue
Contributing: Pull requests welcome! Please follow the existing code style.
License
MIT License - See LICENSE for details
Acknowledgments
Inspiration & Research:
Rasmus Engelbrecht - Semantic layer patterns
Boring Semantic Layer - Foundation framework
Technology:
FastMCP by Jeremiah Lowin
Ibis Project by the Ibis team
DuckDB by DuckDB Labs
Last Updated: November 2025 Version: 1.0 Author: Matt Strautmann Status: v1.0 Production Ready | 100% Test Pass Rate
This server cannot be installed
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/sbdk-dev/local-ai-analyst'
If you have feedback or need assistance with the MCP directory API, please join our Discord server