StatFlow

🎯 About This Project

StatFlow is a personal learning project built to understand and explore the Model Context Protocol (MCP). This project demonstrates how to build an MCP server that provides AI assistants with tools to interact with databases and generate reports.

Project Purpose: Learn MCP architecture, implement MCP servers, and understand how to expose functionality to AI assistants through standardized protocols.

🔌 What is MCP?

Model Context Protocol (MCP) is an open protocol that enables AI assistants to securely access external tools and data sources. It provides a standardized way for AI applications to:

Call Tools: Execute functions or operations (like database queries, file operations)
Access Resources: Read-only access to data (like database tables, file contents)
Interact Securely: Controlled access to external systems without exposing credentials

Why MCP?

Standardized interface for AI-tool integration
Secure and controlled access to resources
Works with Claude Desktop, Cursor, and other MCP-compatible clients
Enables AI assistants to perform complex workflows autonomously

📊 Project Overview

StatFlow - MCP Server for Statistical Analysis & Report Generation

This MCP server demonstrates how to expose database analysis capabilities through MCP tools. It provides AI assistants with the ability to:

Extract data from multiple MySQL databases
Generate statistical analysis tables (t-tests, effect sizes, p-values)
Create formatted Excel reports with visual organization
Generate thesis-quality Word documents with AI-powered insights
Support unlimited databases through dynamic configuration

✨ MCP Server Features

MCP Tools (3 Tools)

StatFlow exposes three MCP tools that AI assistants can call:

run_complete_analysis 🎯
- Complete workflow (DB → Excel → Report)
- Handles entire analysis pipeline
- Returns success status and file paths
generate_analysis_excel 📊
- Database → Excel only
- Fetches data and creates analysis tables
- Returns Excel file path
generate_thesis_report 📚
- Excel → Thesis-quality report
- Generates AI-powered Word document
- Uses OpenAI for content generation

MCP Resources (1 Resource)

experimental_data 📦
- Read-only access to database participant data
- Returns JSON data without modifying database
- Demonstrates MCP resource pattern

Key MCP Concepts Demonstrated

Tool Implementation: How to create MCP tools with parameters and return values
Resource Pattern: Read-only data access without side effects
Server Setup: Standard I/O communication with MCP clients
Error Handling: Proper error responses in MCP format
Dynamic Configuration: Loading database configs at runtime

Additional Features

AI-Powered Report Generation: Customizable writing style and terminology
Comprehensive Analysis: Statistical tables with t-tests, p-values, effect sizes
Flexible Architecture: Support for unlimited databases without code changes
Modular Design: Reusable query and analysis modules

🚀 Quick Start

Installation

# Clone the repository git clone <repository-url> cd statflow # Install dependencies pip install -r requirements.txt

MCP Server Setup

To use StatFlow as an MCP server with Cursor or Claude Desktop:

Configure MCP Client (e.g., ~/.cursor/mcp.json):

{ "mcpServers": { "statflow": { "command": "python", "args": ["-m", "statflow.server"], "cwd": "/path/to/statflow", "env": { "PYTHONPATH": "/path/to/statflow/src" } } } }

Restart your MCP client (Cursor/Claude Desktop)
Use with AI: Ask your AI assistant to use StatFlow tools, e.g., "Run complete analysis using StatFlow"

Configuration

Edit config.json (this file is not tracked in git - create your own):

{ "mysql_dump": { "host": "localhost", "port": 3306, "user": "root", "password": "", "database": "your_database", "prefix": "L1_" }, "mysql_dump_2": { "host": "localhost", "port": 3306, "user": "root", "password": "", "database": "your_database_2", "prefix": "L2_" }, "excel_output": { "default_path": "./results" }, "openai": { "api_key": "your-api-key-here", "enabled": true, "model": "gpt-4o-mini" } }

Note: You can add unlimited databases (mysql_dump_3, mysql_dump_4, etc.) - StatFlow will automatically detect and use them.

📊 Usage

Option 1: Via MCP Server (Recommended)

Once configured, use StatFlow through your MCP-compatible AI assistant:

"Use StatFlow to run complete analysis" "Generate analysis Excel using StatFlow" "Create thesis report from Excel using StatFlow"

The AI assistant will call the appropriate MCP tools automatically.

Option 2: Direct Script Execution

Generate Excel file directly:

python run_analysis.py

Output:

✅ Excel file with comprehensive analysis tables
✅ Statistical analysis tables (t-tests, averages, summaries)
✅ Color-coded sections for easy navigation

Option 3: Programmatic Usage

from statflow.server import ( run_complete_analysis_workflow, generate_analysis_excel_only, generate_thesis_report_internal ) # Run complete workflow result = run_complete_analysis_workflow("config.json", generate_report=True) # Or step by step excel_result = generate_analysis_excel_only("config.json") report_result = generate_thesis_report_internal(excel_result["output_path"], config)

🔧 MCP Server Architecture

MCP Server Implementation

The server (src/statflow/server.py) implements:

MCP Server Class: Uses mcp.server.Server from the MCP Python SDK
Tool Handlers: Async functions that implement MCP tool logic
Resource Handlers: Read-only data access patterns
Standard I/O: Communication via stdio with MCP clients

MCP Tool Structure

Each tool follows the MCP pattern:

@server.list_tools() async def list_tools() -> list[Tool]: """List available MCP tools.""" return [ Tool( name="tool_name", description="What the tool does", inputSchema={ "type": "object", "properties": { "param": {"type": "string", "description": "Parameter description"} } } ) ] @server.call_tool() async def call_tool(name: str, arguments: dict) -> list[TextContent]: """Handle tool execution.""" # Tool implementation return [TextContent(type="text", text=result)]

Key MCP Patterns Used

Tool Discovery: @server.list_tools() decorator
Tool Execution: @server.call_tool() decorator
Resource Access: @server.list_resources() and @server.read_resource()
Error Handling: Proper error responses in MCP format
Type Safety: Using MCP type definitions (Tool, Resource, TextContent)

📖 Report Structure

The generated thesis report includes customizable sections. By default, it generates comprehensive analysis sections:

Time Analysis (~600-900 words)
- Calculation methodology
- Comparison across experimental conditions
- Statistical significance testing
- Overall patterns
Accuracy Analysis (~600-900 words)
- Accuracy computation method
- Performance comparisons
- T-test results and interpretations
- Key findings
Satisfaction Analysis (~600-900 words)
- Satisfaction scoring methodology
- Preference patterns
- Statistical analysis
- User experience insights
Group Comparison Analysis (~900-1200 words)
- Performance by participant groups
- Statistical differences
- Comparative insights
- Recommendations by group type
Overall Summary and Key Findings (~600-900 words)
- Research question results
- Key findings synthesized
- Practical recommendations
- Future directions

Total: ~3,000-5,500 words

Note: Section names and content are fully customizable via the prompts configuration file.

🔧 Customization

Main Configuration File

Edit: src/statflow/analysis/thesis_quality_prompts.py

This file contains all AI instructions in plain English. You can:

Adjust word counts
Change writing style
Add custom instructions
Modify section structure
Update statistical reporting format

Example Customization

To change word count, edit line 32:

LENGTH: 600-900 words per section (concise and focused) # Change to: LENGTH: 800-1000 words per section

📁 Project Structure

statflow/ ├── config.json # Configuration (not in git) ├── run_analysis.py # Main analysis script ├── requirements.txt # Dependencies │ ├── src/statflow/ │ ├── server.py # MCP server (3 tools) │ ├── query_builder.py # Database queries │ │ │ ├── analysis/ │ │ ├── thesis_quality_prompts.py # ⭐ CUSTOMIZE HERE │ │ ├── thesis_report_generator.py # Report engine │ │ ├── ai_insights.py # AI analysis │ │ ├── statistical_analysis.py # Statistics │ │ └── table_generators.py # Excel tables │ │ │ └── queries/ │ ├── time_scores.py # Time analysis │ ├── accuracy_scores.py # Accuracy analysis │ ├── satisfaction_scores.py # Satisfaction analysis │ └── graph_questions.py # Graph questions

📊 Output Files

Files are generated in the path specified in config.json (default: ./results)

File	Description
`experiment_analysis.xlsx`	Comprehensive analysis tables with statistics
`experiment_analysis_THESIS_QUALITY_Report.docx`	3,000-5,500 word thesis-quality report

🔍 Excel File Contents

The Excel file includes:

Main Data Sheet

Participant/experimental unit data
Color-coded sections: User characteristics, Performance metrics, Satisfaction scores
AVERAGE row with summary statistics
Organized by experimental conditions/groups

Statistical Analysis Tables

T-Test tables: Comparative analysis across conditions
Average metrics: Performance comparisons by groups/categories
Overall summaries: Statistical comparisons
P-values and significance levels

AI Insights Sheet

Automated insights from AI analysis
Pattern identification
Data-driven recommendations

🛠️ Requirements

Python: 3.8+
MySQL: Database server
OpenAI API: For thesis report generation (gpt-4o-mini)
Dependencies: Listed in requirements.txt

Key Dependencies

mysql-connector-python openpyxl pandas python-docx openai mcp

📚 Documentation

Query Modules: See src/statflow/queries/README.md for details on creating custom analysis modules
Customization: Edit src/statflow/analysis/thesis_quality_prompts.py to customize report style and terminology
MCP Server: Use the StatFlow MCP server tools for programmatic access

🔬 Example Use Cases

StatFlow can be used for various experimental data analysis scenarios:

User Studies: Compare performance across different interfaces, conditions, or user groups
A/B Testing: Analyze results from experimental and control groups
Longitudinal Studies: Track changes over time across multiple measurement points
Comparative Analysis: Evaluate differences between multiple experimental conditions

Customization for Your Study

You can fully customize StatFlow for your specific research:

Update query modules in src/statflow/queries/ to match your data structure
Modify analysis prompts in src/statflow/analysis/thesis_quality_prompts.py to use your terminology
Adjust statistical analysis parameters to match your research design

🎉 Key Benefits

Automation: Complete workflow from database to publication-ready reports
Flexibility: Customizable analysis modules and report structure
Scalability: Support for unlimited databases without code changes
Efficiency: Automated generation in minutes instead of hours
Quality: Thesis-level academic writing with AI-powered insights
Reproducibility: Consistent analysis pipeline for all your studies

📞 Support

For questions or issues:

Review src/statflow/queries/README.md for query module documentation
Check src/statflow/analysis/thesis_quality_prompts.py for report customization
Examine config.json for configuration options

📄 License

See LICENSE file for details.

🎓 Learning Resources

MCP Documentation

MCP Specification: Model Context Protocol
MCP Python SDK: mcp Python Package
MCP Examples: Check the MCP repository for more examples

Key Learnings from This Project

This project demonstrates:

✅ How to structure an MCP server
✅ Implementing MCP tools with complex workflows
✅ Using MCP resources for read-only data access
✅ Error handling and validation in MCP servers
✅ Dynamic tool/resource discovery
✅ Integrating MCP servers with existing Python codebases

Example Use Case: CSU SSD Study

The codebase includes an example implementation customized for a research study (the "Improving the CSU Student Success Dashboard and Its Analysis" study). This demonstrates how StatFlow can be adapted for domain-specific needs while maintaining a flexible MCP architecture.

Note: This is a personal learning project, not affiliated with any institution.

📝 Project Status

Project Type: Personal Learning Project
Purpose: Learning and exploring Model Context Protocol (MCP)
Status: ✅ Active and fully functional
Last Updated: November 12, 2025
Version: 2.0 (Renamed to StatFlow)

Created by: Rucha D. Nandgirikar
Note: This is a personal project for learning MCP, not affiliated with any institution or organization.

👤 Author & Resources

Author: Rucha D. Nandgirikar

📚 Related Articles

📖 Medium Article - Coming Soon - Learn about building MCP servers with StatFlow

More articles and resources coming soon...

This server cannot be installed

security - not tested

license - permissive license

quality - not tested

How are these scores calculated?

hybrid server

The server is able to function both locally and remotely, depending on the configuration or use case.

Enables AI assistants to perform statistical analysis on MySQL databases, generating formatted Excel reports with t-tests and effect sizes, and creating thesis-quality Word documents with AI-powered insights.