# StatFlow
## šÆ About This Project
**StatFlow** is a personal learning project built to understand and explore the **Model Context Protocol (MCP)**. This project demonstrates how to build an MCP server that provides AI assistants with tools to interact with databases and generate reports.
**Project Purpose**: Learn MCP architecture, implement MCP servers, and understand how to expose functionality to AI assistants through standardized protocols.
---
## š What is MCP?
**Model Context Protocol (MCP)** is an open protocol that enables AI assistants to securely access external tools and data sources. It provides a standardized way for AI applications to:
- **Call Tools**: Execute functions or operations (like database queries, file operations)
- **Access Resources**: Read-only access to data (like database tables, file contents)
- **Interact Securely**: Controlled access to external systems without exposing credentials
**Why MCP?**
- Standardized interface for AI-tool integration
- Secure and controlled access to resources
- Works with Claude Desktop, Cursor, and other MCP-compatible clients
- Enables AI assistants to perform complex workflows autonomously
---
## š Project Overview
**StatFlow** - MCP Server for Statistical Analysis & Report Generation
This MCP server demonstrates how to expose database analysis capabilities through MCP tools. It provides AI assistants with the ability to:
- Extract data from multiple MySQL databases
- Generate statistical analysis tables (t-tests, effect sizes, p-values)
- Create formatted Excel reports with visual organization
- Generate thesis-quality Word documents with AI-powered insights
- Support unlimited databases through dynamic configuration
---
## ⨠MCP Server Features
### **MCP Tools** (3 Tools)
StatFlow exposes three MCP tools that AI assistants can call:
1. **`run_complete_analysis`** šÆ
- Complete workflow (DB ā Excel ā Report)
- Handles entire analysis pipeline
- Returns success status and file paths
2. **`generate_analysis_excel`** š
- Database ā Excel only
- Fetches data and creates analysis tables
- Returns Excel file path
3. **`generate_thesis_report`** š
- Excel ā Thesis-quality report
- Generates AI-powered Word document
- Uses OpenAI for content generation
### **MCP Resources** (1 Resource)
1. **`experimental_data`** š¦
- Read-only access to database participant data
- Returns JSON data without modifying database
- Demonstrates MCP resource pattern
### **Key MCP Concepts Demonstrated**
- **Tool Implementation**: How to create MCP tools with parameters and return values
- **Resource Pattern**: Read-only data access without side effects
- **Server Setup**: Standard I/O communication with MCP clients
- **Error Handling**: Proper error responses in MCP format
- **Dynamic Configuration**: Loading database configs at runtime
### **Additional Features**
- **AI-Powered Report Generation**: Customizable writing style and terminology
- **Comprehensive Analysis**: Statistical tables with t-tests, p-values, effect sizes
- **Flexible Architecture**: Support for unlimited databases without code changes
- **Modular Design**: Reusable query and analysis modules
---
## š Quick Start
### **Installation**
```bash
# Clone the repository
git clone <repository-url>
cd statflow
# Install dependencies
pip install -r requirements.txt
```
### **MCP Server Setup**
To use StatFlow as an MCP server with Cursor or Claude Desktop:
1. **Configure MCP Client** (e.g., `~/.cursor/mcp.json`):
```json
{
"mcpServers": {
"statflow": {
"command": "python",
"args": ["-m", "statflow.server"],
"cwd": "/path/to/statflow",
"env": {
"PYTHONPATH": "/path/to/statflow/src"
}
}
}
}
```
2. **Restart your MCP client** (Cursor/Claude Desktop)
3. **Use with AI**: Ask your AI assistant to use StatFlow tools, e.g., "Run complete analysis using StatFlow"
### **Configuration**
Edit `config.json` (this file is not tracked in git - create your own):
```json
{
"mysql_dump": {
"host": "localhost",
"port": 3306,
"user": "root",
"password": "",
"database": "your_database",
"prefix": "L1_"
},
"mysql_dump_2": {
"host": "localhost",
"port": 3306,
"user": "root",
"password": "",
"database": "your_database_2",
"prefix": "L2_"
},
"excel_output": {
"default_path": "./results"
},
"openai": {
"api_key": "your-api-key-here",
"enabled": true,
"model": "gpt-4o-mini"
}
}
```
**Note**: You can add unlimited databases (mysql_dump_3, mysql_dump_4, etc.) - StatFlow will automatically detect and use them.
---
## š Usage
### **Option 1: Via MCP Server** (Recommended)
Once configured, use StatFlow through your MCP-compatible AI assistant:
```
"Use StatFlow to run complete analysis"
"Generate analysis Excel using StatFlow"
"Create thesis report from Excel using StatFlow"
```
The AI assistant will call the appropriate MCP tools automatically.
### **Option 2: Direct Script Execution**
Generate Excel file directly:
```bash
python run_analysis.py
```
**Output**:
- ā
Excel file with comprehensive analysis tables
- ā
Statistical analysis tables (t-tests, averages, summaries)
- ā
Color-coded sections for easy navigation
### **Option 3: Programmatic Usage**
```python
from statflow.server import (
run_complete_analysis_workflow,
generate_analysis_excel_only,
generate_thesis_report_internal
)
# Run complete workflow
result = run_complete_analysis_workflow("config.json", generate_report=True)
# Or step by step
excel_result = generate_analysis_excel_only("config.json")
report_result = generate_thesis_report_internal(excel_result["output_path"], config)
```
---
## š§ MCP Server Architecture
### **MCP Server Implementation**
The server (`src/statflow/server.py`) implements:
- **MCP Server Class**: Uses `mcp.server.Server` from the MCP Python SDK
- **Tool Handlers**: Async functions that implement MCP tool logic
- **Resource Handlers**: Read-only data access patterns
- **Standard I/O**: Communication via stdio with MCP clients
### **MCP Tool Structure**
Each tool follows the MCP pattern:
```python
@server.list_tools()
async def list_tools() -> list[Tool]:
"""List available MCP tools."""
return [
Tool(
name="tool_name",
description="What the tool does",
inputSchema={
"type": "object",
"properties": {
"param": {"type": "string", "description": "Parameter description"}
}
}
)
]
@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
"""Handle tool execution."""
# Tool implementation
return [TextContent(type="text", text=result)]
```
### **Key MCP Patterns Used**
- **Tool Discovery**: `@server.list_tools()` decorator
- **Tool Execution**: `@server.call_tool()` decorator
- **Resource Access**: `@server.list_resources()` and `@server.read_resource()`
- **Error Handling**: Proper error responses in MCP format
- **Type Safety**: Using MCP type definitions (`Tool`, `Resource`, `TextContent`)
---
## š Report Structure
The generated thesis report includes customizable sections. By default, it generates comprehensive analysis sections:
1. **Time Analysis** (~600-900 words)
- Calculation methodology
- Comparison across experimental conditions
- Statistical significance testing
- Overall patterns
2. **Accuracy Analysis** (~600-900 words)
- Accuracy computation method
- Performance comparisons
- T-test results and interpretations
- Key findings
3. **Satisfaction Analysis** (~600-900 words)
- Satisfaction scoring methodology
- Preference patterns
- Statistical analysis
- User experience insights
4. **Group Comparison Analysis** (~900-1200 words)
- Performance by participant groups
- Statistical differences
- Comparative insights
- Recommendations by group type
5. **Overall Summary and Key Findings** (~600-900 words)
- Research question results
- Key findings synthesized
- Practical recommendations
- Future directions
**Total**: ~3,000-5,500 words
*Note: Section names and content are fully customizable via the prompts configuration file.*
---
## š§ Customization
### **Main Configuration File**
Edit: `src/statflow/analysis/thesis_quality_prompts.py`
This file contains all AI instructions in plain English. You can:
- Adjust word counts
- Change writing style
- Add custom instructions
- Modify section structure
- Update statistical reporting format
### **Example Customization**
To change word count, edit line 32:
```python
LENGTH: 600-900 words per section (concise and focused)
# Change to:
LENGTH: 800-1000 words per section
```
---
## š Project Structure
```
statflow/
āāā config.json # Configuration (not in git)
āāā run_analysis.py # Main analysis script
āāā requirements.txt # Dependencies
ā
āāā src/statflow/
ā āāā server.py # MCP server (3 tools)
ā āāā query_builder.py # Database queries
ā ā
ā āāā analysis/
ā ā āāā thesis_quality_prompts.py # ā CUSTOMIZE HERE
ā ā āāā thesis_report_generator.py # Report engine
ā ā āāā ai_insights.py # AI analysis
ā ā āāā statistical_analysis.py # Statistics
ā ā āāā table_generators.py # Excel tables
ā ā
ā āāā queries/
ā āāā time_scores.py # Time analysis
ā āāā accuracy_scores.py # Accuracy analysis
ā āāā satisfaction_scores.py # Satisfaction analysis
ā āāā graph_questions.py # Graph questions
```
---
## š Output Files
Files are generated in the path specified in `config.json` (default: `./results`)
| File | Description |
|------|-------------|
| `experiment_analysis.xlsx` | Comprehensive analysis tables with statistics |
| `experiment_analysis_THESIS_QUALITY_Report.docx` | 3,000-5,500 word thesis-quality report |
---
## š Excel File Contents
The Excel file includes:
### **Main Data Sheet**
- Participant/experimental unit data
- Color-coded sections: User characteristics, Performance metrics, Satisfaction scores
- AVERAGE row with summary statistics
- Organized by experimental conditions/groups
### **Statistical Analysis Tables**
- **T-Test tables**: Comparative analysis across conditions
- **Average metrics**: Performance comparisons by groups/categories
- **Overall summaries**: Statistical comparisons
- **P-values and significance levels**
### **AI Insights Sheet**
- Automated insights from AI analysis
- Pattern identification
- Data-driven recommendations
---
## š ļø Requirements
- **Python**: 3.8+
- **MySQL**: Database server
- **OpenAI API**: For thesis report generation (gpt-4o-mini)
- **Dependencies**: Listed in `requirements.txt`
### **Key Dependencies**
```
mysql-connector-python
openpyxl
pandas
python-docx
openai
mcp
```
---
## š Documentation
- **Query Modules**: See `src/statflow/queries/README.md` for details on creating custom analysis modules
- **Customization**: Edit `src/statflow/analysis/thesis_quality_prompts.py` to customize report style and terminology
- **MCP Server**: Use the StatFlow MCP server tools for programmatic access
---
## š¬ Example Use Cases
StatFlow can be used for various experimental data analysis scenarios:
- **User Studies**: Compare performance across different interfaces, conditions, or user groups
- **A/B Testing**: Analyze results from experimental and control groups
- **Longitudinal Studies**: Track changes over time across multiple measurement points
- **Comparative Analysis**: Evaluate differences between multiple experimental conditions
### **Customization for Your Study**
You can fully customize StatFlow for your specific research:
- Update query modules in `src/statflow/queries/` to match your data structure
- Modify analysis prompts in `src/statflow/analysis/thesis_quality_prompts.py` to use your terminology
- Adjust statistical analysis parameters to match your research design
---
## š Key Benefits
1. **Automation**: Complete workflow from database to publication-ready reports
2. **Flexibility**: Customizable analysis modules and report structure
3. **Scalability**: Support for unlimited databases without code changes
4. **Efficiency**: Automated generation in minutes instead of hours
5. **Quality**: Thesis-level academic writing with AI-powered insights
6. **Reproducibility**: Consistent analysis pipeline for all your studies
---
## š Support
For questions or issues:
- Review `src/statflow/queries/README.md` for query module documentation
- Check `src/statflow/analysis/thesis_quality_prompts.py` for report customization
- Examine `config.json` for configuration options
---
## š License
See [LICENSE](LICENSE) file for details.
---
## š Learning Resources
### **MCP Documentation**
- **MCP Specification**: [Model Context Protocol](https://modelcontextprotocol.io/)
- **MCP Python SDK**: [mcp Python Package](https://github.com/modelcontextprotocol/python-sdk)
- **MCP Examples**: Check the MCP repository for more examples
### **Key Learnings from This Project**
This project demonstrates:
- ā
How to structure an MCP server
- ā
Implementing MCP tools with complex workflows
- ā
Using MCP resources for read-only data access
- ā
Error handling and validation in MCP servers
- ā
Dynamic tool/resource discovery
- ā
Integrating MCP servers with existing Python codebases
### **Example Use Case: CSU SSD Study**
The codebase includes an example implementation customized for a research study (the "Improving the CSU Student Success Dashboard and Its Analysis" study). This demonstrates how StatFlow can be adapted for domain-specific needs while maintaining a flexible MCP architecture.
**Note**: This is a personal learning project, not affiliated with any institution.
---
## š Project Status
**Project Type**: Personal Learning Project
**Purpose**: Learning and exploring Model Context Protocol (MCP)
**Status**: ā
Active and fully functional
**Last Updated**: November 12, 2025
**Version**: 2.0 (Renamed to StatFlow)
**Created by**: Rucha D. Nandgirikar
**Note**: This is a personal project for learning MCP, not affiliated with any institution or organization.
---
## š¤ Author & Resources
**Author**: Rucha D. Nandgirikar
### š Related Articles
- š [Medium Article - Coming Soon](#) - Learn about building MCP servers with StatFlow
_More articles and resources coming soon..._