TrainerML

README.md•14.4 kB

--- title: AutoML - MCP Hackathon emoji: 🤖 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.0.0 app_file: updated_ML.py pinned: false license: mit tags: - machine-learning - mcp - hackathon - automl - lazypredict - gradio - mcp-server-track - agent-demo-track short_description: Automated ML model comparison with LazyPredict and MCP integration --- # 🤖 AutoML - MCP Hackathon Submission **Automated Machine Learning Platform with LazyPredict and Model Context Protocol Integration** ## 🏆 Hackathon Track **Agents & MCP Hackathon - Track 1: MCP Tool / Server** ## 🌟 Key Features ### Core ML Capabilities - **📤 Dual Data Input**: Support for both local CSV file uploads and public URL data sources - **🎯 Auto Problem Detection**: Automatically determines regression vs classification tasks - **🤖 Multi-Algorithm Comparison**: LazyPredict-powered comparison of 20+ ML algorithms - **📊 Automated EDA**: Comprehensive dataset profiling with ydata-profiling - **💾 Best Model Export**: Download top-performing model as pickle file - **📈 Performance Visualization**: Interactive charts showing model comparison results ### 🚀 Advanced Features - **🌐 URL Data Loading**: Direct data loading from public CSV URLs with robust error handling - **🔄 Agent-Friendly Interface**: Designed for both human users and AI agent interactions - **📊 Interactive Dashboards**: Real-time model performance metrics and visualizations - **🔍 Smart Error Handling**: Comprehensive validation and user feedback system - **💻 MCP Server Integration**: Full Model Context Protocol server implementation ## 🛠️ How It Works The AutoML provides a streamlined pipeline for automated machine learning: ### Core Functions 1. **`load_data(file_input)`** - Universal data loader that handles: - Local CSV file uploads through Gradio's file component - Public CSV URLs with HTTP/HTTPS support - Robust error handling and validation - Automatic format detection and parsing 2. **`analyze_and_model(df, target_column)`** - Core ML pipeline that: - Generates comprehensive EDA reports using ydata-profiling - Automatically detects task type (classification vs regression) based on target variable uniqueness - Trains and evaluates multiple models using LazyPredict - Selects the best performing model based on appropriate metrics - Creates publication-ready visualizations comparing model performance - Exports the best model as a serialized pickle file 3. **`run_pipeline(data_source, target_column)`** - Main orchestration function: - Validates all inputs and provides clear error messages - Coordinates the entire ML workflow from data loading to model export - Generates AI-powered explanations of results - Returns all outputs in a format optimized for both UI and API consumption ### Agent-Friendly Design - **Single Entry Point**: The `run_pipeline()` function serves as the primary interface for AI agents - **Flexible Input Handling**: Automatically determines whether input is a file path or URL - **Comprehensive Output**: Returns all generated artifacts (models, reports, visualizations) - **Error Resilience**: Robust error handling with informative feedback ## 🚀 Quick Start ### 📋 Application File Comparison | Feature | `updated_ML.py` | `fixed_ML_MCP_backup.py` | |---------|----------------|---------------------------| | **Core ML Pipeline** | ✅ Full AutoML functionality | ✅ Full AutoML functionality | | **MCP Server** | ✅ Enabled | ✅ Enhanced configuration | | **UI Interface** | ✅ Clean, streamlined | ✅ Identical interface | | **Code Structure** | ✅ Primary, well-documented | ✅ Backup with additional features | | **Recommended For** | General use, development | Advanced MCP integration | ### Running the Application The project includes two main application files: #### Primary Application: `updated_ML.py` (Recommended) ```bash # Install dependencies pip install -r requirements.txt # Run the main application python updated_ML.py ``` #### Backup Version: `fixed_ML_MCP_backup.py` ```bash # Alternative version with additional MCP features python fixed_ML_MCP_backup.py ``` ### Web Interface 1. **Choose Data Source**: - **Local Upload**: Use the file upload component to select a CSV file from your computer - **URL Input**: Enter a public CSV URL (e.g., from GitHub, data repositories, or cloud storage) 2. **Specify Target**: Enter the exact name of your target column (case-sensitive) 3. **Run Analysis**: Click "Run Analysis & AutoML" to start the AutoML pipeline 4. **Review Results**: - View detected task type (classification/regression) - Examine model performance metrics in the interactive table - Download comprehensive EDA report (HTML format) - Download the best performing model (pickle format) - View model comparison visualization ### Installation & Setup ```bash # Clone the repository git clone [repository-url] cd MCP_Project # Install dependencies pip install -r requirements.txt ``` ### Server Configuration The application launches with the following settings: - **Host**: `0.0.0.0` (accessible from any network interface) - **Port**: `7860` (default Gradio port) - **MCP Server**: Enabled for AI agent integration - **API Documentation**: Available at `/docs` endpoint - **Browser Launch**: Automatic browser opening enabled ## 🎯 Current Implementation ### 1. LazyPredict Integration - **Automated Model Training**: Trains 20+ algorithms automatically - **Performance Comparison**: Side-by-side evaluation of all models - **Best Model Selection**: Automatically selects top performer based on accuracy/R² score ### 2. Comprehensive EDA - **ydata-profiling**: Generates detailed dataset analysis reports - **Automatic Insights**: Data quality, distributions, correlations, and missing values - **Interactive Reports**: Downloadable HTML reports with comprehensive statistics ### 3. Smart Task Detection - **Classification**: Automatically detected when target has ≤10 unique values - **Regression**: Automatically detected for continuous target variables - **Adaptive Metrics**: Uses appropriate evaluation metrics for each task type ### 4. Model Persistence - **Pickle Export**: Save trained models for future use - **Model Reuse**: Load and apply models to new datasets - **Production Ready**: Serialized models ready for deployment ## 📊 Supported Algorithms (via LazyPredict) ### Classification Algorithms - Logistic Regression, Decision Tree Classifier - Random Forest Classifier, Extra Trees Classifier - Gradient Boosting Classifier, AdaBoost Classifier - XGBoost Classifier, LightGBM Classifier - SVM Classifier, K-Nearest Neighbors - Naive Bayes, Linear Discriminant Analysis - Quadratic Discriminant Analysis, and more... ### Regression Algorithms - Linear Regression, Ridge Regression, Lasso Regression - Decision Tree Regressor, Random Forest Regressor - Extra Trees Regressor, Gradient Boosting Regressor - XGBoost Regressor, LightGBM Regressor - Support Vector Regression, K-Nearest Neighbors - AdaBoost Regressor, Elastic Net, and more... ## 🏆 Demo Scenarios ### House Price Prediction (Regression) - Upload `sample_house_prices.csv` included in the project - Enter `price` as the target column name - System automatically detects regression task - Compare performance of 15+ regression algorithms - Download the best performing model and detailed EDA report ### Loan Approval Prediction (Classification) - Upload `sample_loan_approval.csv` included in the project - Enter the loan approval status column name as target - System automatically detects classification task - Compare accuracy of 15+ classification algorithms - Get comprehensive EDA report with approval insights ### College Placement Analysis - Upload `collegePlace.csv` included in the project - Analyze student placement outcomes - Automatic feature analysis and model comparison - Export trained model for future predictions ### URL-Based Data Analysis - Use public dataset URLs for instant analysis - Example: Government open data, research datasets, cloud-hosted files - No file size limitations with URL-based loading - Seamless integration with cloud storage platforms ## 🚀 Technologies Used - **Frontend**: Gradio 4.0+ with soft theme and MCP server integration - **AutoML Engine**: LazyPredict for automated model comparison and evaluation - **EDA Framework**: ydata-profiling for comprehensive dataset analysis and reporting - **ML Libraries**: scikit-learn, XGBoost, LightGBM (via LazyPredict ecosystem) - **Visualization**: Matplotlib and Seaborn for model comparison charts and statistical plots - **Data Processing**: pandas and numpy for efficient data manipulation and preprocessing - **Model Persistence**: pickle for secure model serialization and export - **Web Requests**: requests library for robust URL-based data loading - **MCP Integration**: Model Context Protocol server for AI agent compatibility - **File Handling**: tempfile for secure temporary file management ## 📈 Current Features - **🔄 Dual Input Support**: Upload local CSV files or provide public URLs for data loading - **🤖 One-Click AutoML**: Complete ML pipeline from data upload to trained model export - **🎯 Intelligent Task Detection**: Automatic classification vs regression detection based on target variable analysis - **📊 Multi-Algorithm Comparison**: Simultaneous comparison of 20+ algorithms with LazyPredict - **📋 Comprehensive EDA**: Detailed dataset profiling with statistical analysis and data quality reports - **💾 Model Export**: Download best performing model as pickle file for production deployment - **📈 Performance Visualization**: Clear charts showing algorithm comparison and performance metrics - **🌐 MCP Server Integration**: Full Model Context Protocol support for seamless AI assistant integration - **🛡️ Robust Error Handling**: Comprehensive validation with informative user feedback - **🎨 Modern UI**: Clean, responsive interface optimized for both human and agent interactions ## 🎯 Hackathon Submission Highlights 1. **🤖 LazyPredict Integration**: Automated comparison of 20+ ML algorithms with minimal configuration 2. **🧠 Smart Automation**: Intelligent task detection, data validation, and model selection 3. **📊 Comprehensive Analysis**: ydata-profiling powered EDA reports with statistical insights 4. **👥 Dual Interface Design**: Optimized for both human users and AI agent interactions 5. **🌐 MCP Server Implementation**: Full Model Context Protocol integration for seamless agent workflows 6. **🔄 Flexible Data Loading**: Support for both local uploads and URL-based data sources 7. **📈 Production Ready**: Exportable models, comprehensive documentation, and robust error handling 8. **🎨 Modern UI/UX**: Clean Gradio interface with intuitive workflow and clear feedback systems ## 📦 Project Structure ``` MCP_Project/ ├── updated_ML.py # Primary application file (recommended) ├── fixed_ML_MCP_backup.py # Backup version with enhanced MCP features ├── requirements.txt # Python dependencies ├── pyproject.toml # Project configuration ├── uv.lock # UV dependency lockfile ├── README.md # This documentation ├── sample_house_prices.csv # Demo dataset for regression ├── sample_loan_approval.csv # Demo dataset for classification ├── collegePlace.csv # Demo dataset for placement analysis ├── model_plot.png # Sample visualization output └── __pycache__/ # Python cache files ``` ### Application Files Overview - **`updated_ML.py`**: The main application file with clean, streamlined code structure. Recommended for most users. - **`fixed_ML_MCP_backup.py`**: Alternative version with additional MCP server configurations and enhanced features. Both files provide identical core functionality with slight variations in configuration and additional features. ## 📧 Contact & Support Built with ❤️ for the **Agents & MCP Hackathon 2025** This project demonstrates the power of combining LazyPredict's automated machine learning capabilities with the Model Context Protocol to create an intelligent, easy-to-use ML platform that seamlessly integrates into AI assistant workflows and provides production-ready machine learning solutions. ### 🔮 Features in Development - 🧠 LLM-powered model explanations and insights - ⚙️ Advanced feature engineering and preprocessing pipelines - 🎯 Ensemble model creation and stacking capabilities - 🚀 Real-time prediction API endpoints - 🛠️ Enhanced MCP tool suite with additional ML operations - 📊 Interactive model interpretation and SHAP value analysis ### 🎮 Usage Tips & Best Practices #### Getting Started - **Choose Your File**: Use `updated_ML.py` for standard usage, `fixed_ML_MCP_backup.py` for advanced MCP features - **Target Column**: Ensure your target column name is exactly as it appears in the dataset (case-sensitive) - **Data Sources**: Both local CSV uploads and public URLs are supported seamlessly #### Data Loading Best Practices - **URL Loading**: Use direct links to CSV files (GitHub raw URLs work great!) - **File Size**: No strict limitations, but larger files may take longer to process - **Data Quality**: The system handles missing values automatically, but clean data yields better results #### Model Performance - **Classification**: System uses Accuracy as the primary metric for model selection - **Regression**: System uses R-Squared as the primary metric for model selection - **File Formats**: Currently supports CSV format with automatic delimiter detection - **Column Types**: Handles both numeric and categorical features automatically #### Troubleshooting - **Target Not Found**: Double-check column name spelling and case sensitivity - **URL Issues**: Ensure URLs point directly to CSV files (not web pages) - **Performance**: For large datasets, expect processing times of 2-5 minutes --- **Ready to experience automated machine learning? Upload your dataset or provide a URL and let LazyPredict find the best algorithm for your problem!** 🚀 *Transform your data into insights with just a few clicks - no ML expertise required!*

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/daniel-was-taken/MCP_Project'

If you have feedback or need assistance with the MCP directory API, please join our Discord server