TrainerML

README.md•14.1 KiB

--- title: AutoML - MCP Hackathon emoji: 🤖 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.0.0 app_file: updated_ML.py pinned: false license: mit tags: - machine-learning - mcp - hackathon - automl - lazypredict - gradio - mcp-server-track - agent-demo-track short_description: Automated ML model comparison with LazyPredict and MCP integration --- # 🤖 AutoML - MCP Hackathon Submission **Automated Machine Learning Platform with LazyPredict and Model Context Protocol Integration** ## 🏆 Hackathon Track **Agents & MCP Hackathon - Track 1: MCP Tool / Server** ## 🌟 Key Features ### Core ML Capabilities - **📤 Dual Data Input**: Support for both local CSV file uploads and public URL data sources - **🎯 Auto Problem Detection**: Automatically determines regression vs classification tasks - **🤖 Multi-Algorithm Comparison**: LazyPredict-powered comparison of 20+ ML algorithms - **📊 Automated EDA**: Comprehensive dataset profiling with ydata-profiling - **💾 Best Model Export**: Download top-performing model as pickle file - **📈 Performance Visualization**: Interactive charts showing model comparison results ### 🚀 Advanced Features - **🌐 URL Data Loading**: Direct data loading from public CSV URLs with robust error handling - **🔄 Agent-Friendly Interface**: Designed for both human users and AI agent interactions - **📊 Interactive Dashboards**: Real-time model performance metrics and visualizations - **🔍 Smart Error Handling**: Comprehensive validation and user feedback system - **💻 MCP Server Integration**: Full Model Context Protocol server implementation ## 🛠️ How It Works The AutoML provides a streamlined pipeline for automated machine learning: ### Core Functions 1. **`load_data(file_input)`** - Universal data loader that handles: - Local CSV file uploads through Gradio's file component - Public CSV URLs with HTTP/HTTPS support - Robust error handling and validation - Automatic format detection and parsing 2. **`analyze_and_model(df, target_column)`** - Core ML pipeline that: - Generates comprehensive EDA reports using ydata-profiling - Automatically detects task type (classification vs regression) based on target variable uniqueness - Trains and evaluates multiple models using LazyPredict - Selects the best performing model based on appropriate metrics - Creates publication-ready visualizations comparing model performance - Exports the best model as a serialized pickle file 3. **`run_pipeline(data_source, target_column)`** - Main orchestration function: - Validates all inputs and provides clear error messages - Coordinates the entire ML workflow from data loading to model export - Generates AI-powered explanations of results - Returns all outputs in a format optimized for both UI and API consumption ### Agent-Friendly Design - **Single Entry Point**: The `run_pipeline()` function serves as the primary interface for AI agents - **Flexible Input Handling**: Automatically determines whether input is a file path or URL - **Comprehensive Output**: Returns all generated artifacts (models, reports, visualizations) - **Error Resilience**: Robust error handling with informative feedback ## 🚀 Quick Start ### 📋 Application File Comparison | Feature | `updated_ML.py` | `fixed_ML_MCP_backup.py` | |---------|----------------|---------------------------| | **Core ML Pipeline** | ✅ Full AutoML functionality | ✅ Full AutoML functionality | | **MCP Server** | ✅ Enabled | ✅ Enhanced configuration | | **UI Interface** | ✅ Clean, streamlined | ✅ Identical interface | | **Code Structure** | ✅ Primary, well-documented | ✅ Backup with additional features | | **Recommended For** | General use, development | Advanced MCP integration | ### Running the Application The project includes two main application files: #### Primary Application: `updated_ML.py` (Recommended) ```bash # Install dependencies pip install -r requirements.txt # Run the main application python updated_ML.py ``` #### Backup Version: `fixed_ML_MCP_backup.py` ```bash # Alternative version with additional MCP features python fixed_ML_MCP_backup.py ``` ### Web Interface 1. **Choose Data Source**: - **Local Upload**: Use the file upload component to select a CSV file from your computer - **URL Input**: Enter a public CSV URL (e.g., from GitHub, data repositories, or cloud storage) 2. **Specify Target**: Enter the exact name of your target column (case-sensitive) 3. **Run Analysis**: Click "Run Analysis & AutoML" to start the AutoML pipeline 4. **Review Results**: - View detected task type (classification/regression) - Examine model performance metrics in the interactive table - Download comprehensive EDA report (HTML format) - Download the best performing model (pickle format) - View model comparison visualization ### Installation & Setup ```bash # Clone the repository git clone [repository-url] cd MCP_Project # Install dependencies pip install -r requirements.txt ``` ### Server Configuration The application launches with the following settings: - **Host**: `0.0.0.0` (accessible from any network interface) - **Port**: `7860` (default Gradio port) - **MCP Server**: Enabled for AI agent integration - **API Documentation**: Available at `/docs` endpoint - **Browser Launch**: Automatic browser opening enabled ## 🎯 Current Implementation ### 1. LazyPredict Integration - **Automated Model Training**: Trains 20+ algorithms automatically - **Performance Comparison**: Side-by-side evaluation of all models - **Best Model Selection**: Automatically selects top performer based on accuracy/R² score ### 2. Comprehensive EDA - **ydata-profiling**: Generates detailed dataset analysis reports - **Automatic Insights**: Data quality, distributions, correlations, and missing values - **Interactive Reports**: Downloadable HTML reports with comprehensive statistics ### 3. Smart Task Detection - **Classification**: Automatically detected when target has ≤10 unique values - **Regression**: Automatically detected for continuous target variables - **Adaptive Metrics**: Uses appropriate evaluation metrics for each task type ### 4. Model Persistence - **Pickle Export**: Save trained models for future use - **Model Reuse**: Load and apply models to new datasets - **Production Ready**: Serialized models ready for deployment ## 📊 Supported Algorithms (via LazyPredict) ### Classification Algorithms - Logistic Regression, Decision Tree Classifier - Random Forest Classifier, Extra Trees Classifier - Gradient Boosting Classifier, AdaBoost Classifier - XGBoost Classifier, LightGBM Classifier - SVM Classifier, K-Nearest Neighbors - Naive Bayes, Linear Discriminant Analysis - Quadratic Discriminant Analysis, and more... ### Regression Algorithms - Linear Regression, Ridge Regression, Lasso Regression - Decision Tree Regressor, Random Forest Regressor - Extra Trees Regressor, Gradient Boosting Regressor - XGBoost Regressor, LightGBM Regressor - Support Vector Regression, K-Nearest Neighbors - AdaBoost Regressor, Elastic Net, and more... ## 🏆 Demo Scenarios ### House Price Prediction (Regression) - Upload `sample_house_prices.csv` included in the project - Enter `price` as the target column name - System automatically detects regression task - Compare performance of 15+ regression algorithms - Download the best performing model and detailed EDA report ### Loan Approval Prediction (Classification) - Upload `sample_loan_approval.csv` included in the project - Enter the loan approval status column name as target - System automatically detects classification task - Compare accuracy of 15+ classification algorithms - Get comprehensive EDA report with approval insights ### College Placement Analysis - Upload `collegePlace.csv` included in the project - Analyze student placement outcomes - Automatic feature analysis and model comparison - Export trained model for future predictions ### URL-Based Data Analysis - Use public dataset URLs for instant analysis - Example: Government open data, research datasets, cloud-hosted files - No file size limitations with URL-based loading - Seamless integration with cloud storage platforms ## 🚀 Technologies Used - **Frontend**: Gradio 4.0+ with soft theme and MCP server integration - **AutoML Engine**: LazyPredict for automated model comparison and evaluation - **EDA Framework**: ydata-profiling for comprehensive dataset analysis and reporting - **ML Libraries**: scikit-learn, XGBoost, LightGBM (via LazyPredict ecosystem) - **Visualization**: Matplotlib and Seaborn for model comparison charts and statistical plots - **Data Processing**: pandas and numpy for efficient data manipulation and preprocessing - **Model Persistence**: pickle for secure model serialization and export - **Web Requests**: requests library for robust URL-based data loading - **MCP Integration**: Model Context Protocol server for AI agent compatibility - **File Handling**: tempfile for secure temporary file management ## 📈 Current Features - **🔄 Dual Input Support**: Upload local CSV files or provide public URLs for data loading - **🤖 One-Click AutoML**: Complete ML pipeline from data upload to trained model export - **🎯 Intelligent Task Detection**: Automatic classification vs regression detection based on target variable analysis - **📊 Multi-Algorithm Comparison**: Simultaneous comparison of 20+ algorithms with LazyPredict - **📋 Comprehensive EDA**: Detailed dataset profiling with statistical analysis and data quality reports - **💾 Model Export**: Download best performing model as pickle file for production deployment - **📈 Performance Visualization**: Clear charts showing algorithm comparison and performance metrics - **🌐 MCP Server Integration**: Full Model Context Protocol support for seamless AI assistant integration - **🛡️ Robust Error Handling**: Comprehensive validation with informative user feedback - **🎨 Modern UI**: Clean, responsive interface optimized for both human and agent interactions ## 🎯 Hackathon Submission Highlights 1. **🤖 LazyPredict Integration**: Automated comparison of 20+ ML algorithms with minimal configuration 2. **🧠 Smart Automation**: Intelligent task detection, data validation, and model selection 3. **📊 Comprehensive Analysis**: ydata-profiling powered EDA reports with statistical insights 4. **👥 Dual Interface Design**: Optimized for both human users and AI agent interactions 5. **🌐 MCP Server Implementation**: Full Model Context Protocol integration for seamless agent workflows 6. **🔄 Flexible Data Loading**: Support for both local uploads and URL-based data sources 7. **📈 Production Ready**: Exportable models, comprehensive documentation, and robust error handling 8. **🎨 Modern UI/UX**: Clean Gradio interface with intuitive workflow and clear feedback systems ## 📦 Project Structure ``` MCP_Project/ ├── updated_ML.py # Primary application file (recommended) ├── fixed_ML_MCP_backup.py # Backup version with enhanced MCP features ├── requirements.txt # Python dependencies ├── pyproject.toml # Project configuration ├── uv.lock # UV dependency lockfile ├── README.md # This documentation ├── sample_house_prices.csv # Demo dataset for regression ├── sample_loan_approval.csv # Demo dataset for classification ├── collegePlace.csv # Demo dataset for placement analysis ├── model_plot.png # Sample visualization output └── __pycache__/ # Python cache files ``` ### Application Files Overview - **`updated_ML.py`**: The main application file with clean, streamlined code structure. Recommended for most users. - **`fixed_ML_MCP_backup.py`**: Alternative version with additional MCP server configurations and enhanced features. Both files provide identical core functionality with slight variations in configuration and additional features. ## 📧 Contact & Support Built with ❤️ for the **Agents & MCP Hackathon 2025** This project demonstrates the power of combining LazyPredict's automated machine learning capabilities with the Model Context Protocol to create an intelligent, easy-to-use ML platform that seamlessly integrates into AI assistant workflows and provides production-ready machine learning solutions. ### 🔮 Features in Development - 🧠 LLM-powered model explanations and insights - ⚙️ Advanced feature engineering and preprocessing pipelines - 🎯 Ensemble model creation and stacking capabilities - 🚀 Real-time prediction API endpoints - 🛠️ Enhanced MCP tool suite with additional ML operations - 📊 Interactive model interpretation and SHAP value analysis ### 🎮 Usage Tips & Best Practices #### Getting Started - **Choose Your File**: Use `updated_ML.py` for standard usage, `fixed_ML_MCP_backup.py` for advanced MCP features - **Target Column**: Ensure your target column name is exactly as it appears in the dataset (case-sensitive) - **Data Sources**: Both local CSV uploads and public URLs are supported seamlessly #### Data Loading Best Practices - **URL Loading**: Use direct links to CSV files (GitHub raw URLs work great!) - **File Size**: No strict limitations, but larger files may take longer to process - **Data Quality**: The system handles missing values automatically, but clean data yields better results #### Model Performance - **Classification**: System uses Accuracy as the primary metric for model selection - **Regression**: System uses R-Squared as the primary metric for model selection - **File Formats**: Currently supports CSV format with automatic delimiter detection - **Column Types**: Handles both numeric and categorical features automatically #### Troubleshooting - **Target Not Found**: Double-check column name spelling and case sensitivity - **URL Issues**: Ensure URLs point directly to CSV files (not web pages) - **Performance**: For large datasets, expect processing times of 2-5 minutes --- **Ready to experience automated machine learning? Upload your dataset or provide a URL and let LazyPredict find the best algorithm for your problem!** 🚀 *Transform your data into insights with just a few clicks - no ML expertise required!*

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/daniel-was-taken/MCP_Project'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

README.md•14.1 KiB