StatFlow

README.md•14.6 KiB

# StatFlow ## 🎯 About This Project **StatFlow** is a personal learning project built to understand and explore the **Model Context Protocol (MCP)**. This project demonstrates how to build an MCP server that provides AI assistants with tools to interact with databases and generate reports. **Project Purpose**: Learn MCP architecture, implement MCP servers, and understand how to expose functionality to AI assistants through standardized protocols. --- ## 🔌 What is MCP? **Model Context Protocol (MCP)** is an open protocol that enables AI assistants to securely access external tools and data sources. It provides a standardized way for AI applications to: - **Call Tools**: Execute functions or operations (like database queries, file operations) - **Access Resources**: Read-only access to data (like database tables, file contents) - **Interact Securely**: Controlled access to external systems without exposing credentials **Why MCP?** - Standardized interface for AI-tool integration - Secure and controlled access to resources - Works with Claude Desktop, Cursor, and other MCP-compatible clients - Enables AI assistants to perform complex workflows autonomously --- ## 📊 Project Overview **StatFlow** - MCP Server for Statistical Analysis & Report Generation This MCP server demonstrates how to expose database analysis capabilities through MCP tools. It provides AI assistants with the ability to: - Extract data from multiple MySQL databases - Generate statistical analysis tables (t-tests, effect sizes, p-values) - Create formatted Excel reports with visual organization - Generate thesis-quality Word documents with AI-powered insights - Support unlimited databases through dynamic configuration --- ## ✨ MCP Server Features ### **MCP Tools** (3 Tools) StatFlow exposes three MCP tools that AI assistants can call: 1. **`run_complete_analysis`** 🎯 - Complete workflow (DB → Excel → Report) - Handles entire analysis pipeline - Returns success status and file paths 2. **`generate_analysis_excel`** 📊 - Database → Excel only - Fetches data and creates analysis tables - Returns Excel file path 3. **`generate_thesis_report`** 📚 - Excel → Thesis-quality report - Generates AI-powered Word document - Uses OpenAI for content generation ### **MCP Resources** (1 Resource) 1. **`experimental_data`** 📦 - Read-only access to database participant data - Returns JSON data without modifying database - Demonstrates MCP resource pattern ### **Key MCP Concepts Demonstrated** - **Tool Implementation**: How to create MCP tools with parameters and return values - **Resource Pattern**: Read-only data access without side effects - **Server Setup**: Standard I/O communication with MCP clients - **Error Handling**: Proper error responses in MCP format - **Dynamic Configuration**: Loading database configs at runtime ### **Additional Features** - **AI-Powered Report Generation**: Customizable writing style and terminology - **Comprehensive Analysis**: Statistical tables with t-tests, p-values, effect sizes - **Flexible Architecture**: Support for unlimited databases without code changes - **Modular Design**: Reusable query and analysis modules --- ## 🚀 Quick Start ### **Installation** ```bash # Clone the repository git clone <repository-url> cd statflow # Install dependencies pip install -r requirements.txt ``` ### **MCP Server Setup** To use StatFlow as an MCP server with Cursor or Claude Desktop: 1. **Configure MCP Client** (e.g., `~/.cursor/mcp.json`): ```json { "mcpServers": { "statflow": { "command": "python", "args": ["-m", "statflow.server"], "cwd": "/path/to/statflow", "env": { "PYTHONPATH": "/path/to/statflow/src" } } } } ``` 2. **Restart your MCP client** (Cursor/Claude Desktop) 3. **Use with AI**: Ask your AI assistant to use StatFlow tools, e.g., "Run complete analysis using StatFlow" ### **Configuration** Edit `config.json` (this file is not tracked in git - create your own): ```json { "mysql_dump": { "host": "localhost", "port": 3306, "user": "root", "password": "", "database": "your_database", "prefix": "L1_" }, "mysql_dump_2": { "host": "localhost", "port": 3306, "user": "root", "password": "", "database": "your_database_2", "prefix": "L2_" }, "excel_output": { "default_path": "./results" }, "openai": { "api_key": "your-api-key-here", "enabled": true, "model": "gpt-4o-mini" } } ``` **Note**: You can add unlimited databases (mysql_dump_3, mysql_dump_4, etc.) - StatFlow will automatically detect and use them. --- ## 📊 Usage ### **Option 1: Via MCP Server** (Recommended) Once configured, use StatFlow through your MCP-compatible AI assistant: ``` "Use StatFlow to run complete analysis" "Generate analysis Excel using StatFlow" "Create thesis report from Excel using StatFlow" ``` The AI assistant will call the appropriate MCP tools automatically. ### **Option 2: Direct Script Execution** Generate Excel file directly: ```bash python run_analysis.py ``` **Output**: - ✅ Excel file with comprehensive analysis tables - ✅ Statistical analysis tables (t-tests, averages, summaries) - ✅ Color-coded sections for easy navigation ### **Option 3: Programmatic Usage** ```python from statflow.server import ( run_complete_analysis_workflow, generate_analysis_excel_only, generate_thesis_report_internal ) # Run complete workflow result = run_complete_analysis_workflow("config.json", generate_report=True) # Or step by step excel_result = generate_analysis_excel_only("config.json") report_result = generate_thesis_report_internal(excel_result["output_path"], config) ``` --- ## 🔧 MCP Server Architecture ### **MCP Server Implementation** The server (`src/statflow/server.py`) implements: - **MCP Server Class**: Uses `mcp.server.Server` from the MCP Python SDK - **Tool Handlers**: Async functions that implement MCP tool logic - **Resource Handlers**: Read-only data access patterns - **Standard I/O**: Communication via stdio with MCP clients ### **MCP Tool Structure** Each tool follows the MCP pattern: ```python @server.list_tools() async def list_tools() -> list[Tool]: """List available MCP tools.""" return [ Tool( name="tool_name", description="What the tool does", inputSchema={ "type": "object", "properties": { "param": {"type": "string", "description": "Parameter description"} } } ) ] @server.call_tool() async def call_tool(name: str, arguments: dict) -> list[TextContent]: """Handle tool execution.""" # Tool implementation return [TextContent(type="text", text=result)] ``` ### **Key MCP Patterns Used** - **Tool Discovery**: `@server.list_tools()` decorator - **Tool Execution**: `@server.call_tool()` decorator - **Resource Access**: `@server.list_resources()` and `@server.read_resource()` - **Error Handling**: Proper error responses in MCP format - **Type Safety**: Using MCP type definitions (`Tool`, `Resource`, `TextContent`) --- ## 📖 Report Structure The generated thesis report includes customizable sections. By default, it generates comprehensive analysis sections: 1. **Time Analysis** (~600-900 words) - Calculation methodology - Comparison across experimental conditions - Statistical significance testing - Overall patterns 2. **Accuracy Analysis** (~600-900 words) - Accuracy computation method - Performance comparisons - T-test results and interpretations - Key findings 3. **Satisfaction Analysis** (~600-900 words) - Satisfaction scoring methodology - Preference patterns - Statistical analysis - User experience insights 4. **Group Comparison Analysis** (~900-1200 words) - Performance by participant groups - Statistical differences - Comparative insights - Recommendations by group type 5. **Overall Summary and Key Findings** (~600-900 words) - Research question results - Key findings synthesized - Practical recommendations - Future directions **Total**: ~3,000-5,500 words *Note: Section names and content are fully customizable via the prompts configuration file.* --- ## 🔧 Customization ### **Main Configuration File** Edit: `src/statflow/analysis/thesis_quality_prompts.py` This file contains all AI instructions in plain English. You can: - Adjust word counts - Change writing style - Add custom instructions - Modify section structure - Update statistical reporting format ### **Example Customization** To change word count, edit line 32: ```python LENGTH: 600-900 words per section (concise and focused) # Change to: LENGTH: 800-1000 words per section ``` --- ## 📁 Project Structure ``` statflow/ ├── config.json # Configuration (not in git) ├── run_analysis.py # Main analysis script ├── requirements.txt # Dependencies │ ├── src/statflow/ │ ├── server.py # MCP server (3 tools) │ ├── query_builder.py # Database queries │ │ │ ├── analysis/ │ │ ├── thesis_quality_prompts.py # ⭐ CUSTOMIZE HERE │ │ ├── thesis_report_generator.py # Report engine │ │ ├── ai_insights.py # AI analysis │ │ ├── statistical_analysis.py # Statistics │ │ └── table_generators.py # Excel tables │ │ │ └── queries/ │ ├── time_scores.py # Time analysis │ ├── accuracy_scores.py # Accuracy analysis │ ├── satisfaction_scores.py # Satisfaction analysis │ └── graph_questions.py # Graph questions ``` --- ## 📊 Output Files Files are generated in the path specified in `config.json` (default: `./results`) | File | Description | |------|-------------| | `experiment_analysis.xlsx` | Comprehensive analysis tables with statistics | | `experiment_analysis_THESIS_QUALITY_Report.docx` | 3,000-5,500 word thesis-quality report | --- ## 🔍 Excel File Contents The Excel file includes: ### **Main Data Sheet** - Participant/experimental unit data - Color-coded sections: User characteristics, Performance metrics, Satisfaction scores - AVERAGE row with summary statistics - Organized by experimental conditions/groups ### **Statistical Analysis Tables** - **T-Test tables**: Comparative analysis across conditions - **Average metrics**: Performance comparisons by groups/categories - **Overall summaries**: Statistical comparisons - **P-values and significance levels** ### **AI Insights Sheet** - Automated insights from AI analysis - Pattern identification - Data-driven recommendations --- ## 🛠️ Requirements - **Python**: 3.8+ - **MySQL**: Database server - **OpenAI API**: For thesis report generation (gpt-4o-mini) - **Dependencies**: Listed in `requirements.txt` ### **Key Dependencies** ``` mysql-connector-python openpyxl pandas python-docx openai mcp ``` --- ## 📚 Documentation - **Query Modules**: See `src/statflow/queries/README.md` for details on creating custom analysis modules - **Customization**: Edit `src/statflow/analysis/thesis_quality_prompts.py` to customize report style and terminology - **MCP Server**: Use the StatFlow MCP server tools for programmatic access --- ## 🔬 Example Use Cases StatFlow can be used for various experimental data analysis scenarios: - **User Studies**: Compare performance across different interfaces, conditions, or user groups - **A/B Testing**: Analyze results from experimental and control groups - **Longitudinal Studies**: Track changes over time across multiple measurement points - **Comparative Analysis**: Evaluate differences between multiple experimental conditions ### **Customization for Your Study** You can fully customize StatFlow for your specific research: - Update query modules in `src/statflow/queries/` to match your data structure - Modify analysis prompts in `src/statflow/analysis/thesis_quality_prompts.py` to use your terminology - Adjust statistical analysis parameters to match your research design --- ## 🎉 Key Benefits 1. **Automation**: Complete workflow from database to publication-ready reports 2. **Flexibility**: Customizable analysis modules and report structure 3. **Scalability**: Support for unlimited databases without code changes 4. **Efficiency**: Automated generation in minutes instead of hours 5. **Quality**: Thesis-level academic writing with AI-powered insights 6. **Reproducibility**: Consistent analysis pipeline for all your studies --- ## 📞 Support For questions or issues: - Review `src/statflow/queries/README.md` for query module documentation - Check `src/statflow/analysis/thesis_quality_prompts.py` for report customization - Examine `config.json` for configuration options --- ## 📄 License See [LICENSE](LICENSE) file for details. --- ## 🎓 Learning Resources ### **MCP Documentation** - **MCP Specification**: [Model Context Protocol](https://modelcontextprotocol.io/) - **MCP Python SDK**: [mcp Python Package](https://github.com/modelcontextprotocol/python-sdk) - **MCP Examples**: Check the MCP repository for more examples ### **Key Learnings from This Project** This project demonstrates: - ✅ How to structure an MCP server - ✅ Implementing MCP tools with complex workflows - ✅ Using MCP resources for read-only data access - ✅ Error handling and validation in MCP servers - ✅ Dynamic tool/resource discovery - ✅ Integrating MCP servers with existing Python codebases ### **Example Use Case: CSU SSD Study** The codebase includes an example implementation customized for a research study (the "Improving the CSU Student Success Dashboard and Its Analysis" study). This demonstrates how StatFlow can be adapted for domain-specific needs while maintaining a flexible MCP architecture. **Note**: This is a personal learning project, not affiliated with any institution. --- ## 📝 Project Status **Project Type**: Personal Learning Project **Purpose**: Learning and exploring Model Context Protocol (MCP) **Status**: ✅ Active and fully functional **Last Updated**: November 12, 2025 **Version**: 2.0 (Renamed to StatFlow) **Created by**: Rucha D. Nandgirikar **Note**: This is a personal project for learning MCP, not affiliated with any institution or organization. --- ## 👤 Author & Resources **Author**: Rucha D. Nandgirikar ### 📚 Related Articles - 📖 [Medium Article - Coming Soon](#) - Learn about building MCP servers with StatFlow _More articles and resources coming soon..._

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Rucha-Nandgirikar/statflow'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•14.6 KiB