Skip to main content
Glama
README.mdโ€ข14.4 kB
--- title: AutoML - MCP Hackathon emoji: ๐Ÿค– colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.0.0 app_file: updated_ML.py pinned: false license: mit tags: - machine-learning - mcp - hackathon - automl - lazypredict - gradio - mcp-server-track - agent-demo-track short_description: Automated ML model comparison with LazyPredict and MCP integration --- # ๐Ÿค– AutoML - MCP Hackathon Submission **Automated Machine Learning Platform with LazyPredict and Model Context Protocol Integration** ## ๐Ÿ† Hackathon Track **Agents & MCP Hackathon - Track 1: MCP Tool / Server** ## ๐ŸŒŸ Key Features ### Core ML Capabilities - **๐Ÿ“ค Dual Data Input**: Support for both local CSV file uploads and public URL data sources - **๐ŸŽฏ Auto Problem Detection**: Automatically determines regression vs classification tasks - **๐Ÿค– Multi-Algorithm Comparison**: LazyPredict-powered comparison of 20+ ML algorithms - **๐Ÿ“Š Automated EDA**: Comprehensive dataset profiling with ydata-profiling - **๐Ÿ’พ Best Model Export**: Download top-performing model as pickle file - **๐Ÿ“ˆ Performance Visualization**: Interactive charts showing model comparison results ### ๐Ÿš€ Advanced Features - **๐ŸŒ URL Data Loading**: Direct data loading from public CSV URLs with robust error handling - **๐Ÿ”„ Agent-Friendly Interface**: Designed for both human users and AI agent interactions - **๐Ÿ“Š Interactive Dashboards**: Real-time model performance metrics and visualizations - **๐Ÿ” Smart Error Handling**: Comprehensive validation and user feedback system - **๐Ÿ’ป MCP Server Integration**: Full Model Context Protocol server implementation ## ๐Ÿ› ๏ธ How It Works The AutoML provides a streamlined pipeline for automated machine learning: ### Core Functions 1. **`load_data(file_input)`** - Universal data loader that handles: - Local CSV file uploads through Gradio's file component - Public CSV URLs with HTTP/HTTPS support - Robust error handling and validation - Automatic format detection and parsing 2. **`analyze_and_model(df, target_column)`** - Core ML pipeline that: - Generates comprehensive EDA reports using ydata-profiling - Automatically detects task type (classification vs regression) based on target variable uniqueness - Trains and evaluates multiple models using LazyPredict - Selects the best performing model based on appropriate metrics - Creates publication-ready visualizations comparing model performance - Exports the best model as a serialized pickle file 3. **`run_pipeline(data_source, target_column)`** - Main orchestration function: - Validates all inputs and provides clear error messages - Coordinates the entire ML workflow from data loading to model export - Generates AI-powered explanations of results - Returns all outputs in a format optimized for both UI and API consumption ### Agent-Friendly Design - **Single Entry Point**: The `run_pipeline()` function serves as the primary interface for AI agents - **Flexible Input Handling**: Automatically determines whether input is a file path or URL - **Comprehensive Output**: Returns all generated artifacts (models, reports, visualizations) - **Error Resilience**: Robust error handling with informative feedback ## ๐Ÿš€ Quick Start ### ๐Ÿ“‹ Application File Comparison | Feature | `updated_ML.py` | `fixed_ML_MCP_backup.py` | |---------|----------------|---------------------------| | **Core ML Pipeline** | โœ… Full AutoML functionality | โœ… Full AutoML functionality | | **MCP Server** | โœ… Enabled | โœ… Enhanced configuration | | **UI Interface** | โœ… Clean, streamlined | โœ… Identical interface | | **Code Structure** | โœ… Primary, well-documented | โœ… Backup with additional features | | **Recommended For** | General use, development | Advanced MCP integration | ### Running the Application The project includes two main application files: #### Primary Application: `updated_ML.py` (Recommended) ```bash # Install dependencies pip install -r requirements.txt # Run the main application python updated_ML.py ``` #### Backup Version: `fixed_ML_MCP_backup.py` ```bash # Alternative version with additional MCP features python fixed_ML_MCP_backup.py ``` ### Web Interface 1. **Choose Data Source**: - **Local Upload**: Use the file upload component to select a CSV file from your computer - **URL Input**: Enter a public CSV URL (e.g., from GitHub, data repositories, or cloud storage) 2. **Specify Target**: Enter the exact name of your target column (case-sensitive) 3. **Run Analysis**: Click "Run Analysis & AutoML" to start the AutoML pipeline 4. **Review Results**: - View detected task type (classification/regression) - Examine model performance metrics in the interactive table - Download comprehensive EDA report (HTML format) - Download the best performing model (pickle format) - View model comparison visualization ### Installation & Setup ```bash # Clone the repository git clone [repository-url] cd MCP_Project # Install dependencies pip install -r requirements.txt ``` ### Server Configuration The application launches with the following settings: - **Host**: `0.0.0.0` (accessible from any network interface) - **Port**: `7860` (default Gradio port) - **MCP Server**: Enabled for AI agent integration - **API Documentation**: Available at `/docs` endpoint - **Browser Launch**: Automatic browser opening enabled ## ๐ŸŽฏ Current Implementation ### 1. LazyPredict Integration - **Automated Model Training**: Trains 20+ algorithms automatically - **Performance Comparison**: Side-by-side evaluation of all models - **Best Model Selection**: Automatically selects top performer based on accuracy/Rยฒ score ### 2. Comprehensive EDA - **ydata-profiling**: Generates detailed dataset analysis reports - **Automatic Insights**: Data quality, distributions, correlations, and missing values - **Interactive Reports**: Downloadable HTML reports with comprehensive statistics ### 3. Smart Task Detection - **Classification**: Automatically detected when target has โ‰ค10 unique values - **Regression**: Automatically detected for continuous target variables - **Adaptive Metrics**: Uses appropriate evaluation metrics for each task type ### 4. Model Persistence - **Pickle Export**: Save trained models for future use - **Model Reuse**: Load and apply models to new datasets - **Production Ready**: Serialized models ready for deployment ## ๐Ÿ“Š Supported Algorithms (via LazyPredict) ### Classification Algorithms - Logistic Regression, Decision Tree Classifier - Random Forest Classifier, Extra Trees Classifier - Gradient Boosting Classifier, AdaBoost Classifier - XGBoost Classifier, LightGBM Classifier - SVM Classifier, K-Nearest Neighbors - Naive Bayes, Linear Discriminant Analysis - Quadratic Discriminant Analysis, and more... ### Regression Algorithms - Linear Regression, Ridge Regression, Lasso Regression - Decision Tree Regressor, Random Forest Regressor - Extra Trees Regressor, Gradient Boosting Regressor - XGBoost Regressor, LightGBM Regressor - Support Vector Regression, K-Nearest Neighbors - AdaBoost Regressor, Elastic Net, and more... ## ๐Ÿ† Demo Scenarios ### House Price Prediction (Regression) - Upload `sample_house_prices.csv` included in the project - Enter `price` as the target column name - System automatically detects regression task - Compare performance of 15+ regression algorithms - Download the best performing model and detailed EDA report ### Loan Approval Prediction (Classification) - Upload `sample_loan_approval.csv` included in the project - Enter the loan approval status column name as target - System automatically detects classification task - Compare accuracy of 15+ classification algorithms - Get comprehensive EDA report with approval insights ### College Placement Analysis - Upload `collegePlace.csv` included in the project - Analyze student placement outcomes - Automatic feature analysis and model comparison - Export trained model for future predictions ### URL-Based Data Analysis - Use public dataset URLs for instant analysis - Example: Government open data, research datasets, cloud-hosted files - No file size limitations with URL-based loading - Seamless integration with cloud storage platforms ## ๐Ÿš€ Technologies Used - **Frontend**: Gradio 4.0+ with soft theme and MCP server integration - **AutoML Engine**: LazyPredict for automated model comparison and evaluation - **EDA Framework**: ydata-profiling for comprehensive dataset analysis and reporting - **ML Libraries**: scikit-learn, XGBoost, LightGBM (via LazyPredict ecosystem) - **Visualization**: Matplotlib and Seaborn for model comparison charts and statistical plots - **Data Processing**: pandas and numpy for efficient data manipulation and preprocessing - **Model Persistence**: pickle for secure model serialization and export - **Web Requests**: requests library for robust URL-based data loading - **MCP Integration**: Model Context Protocol server for AI agent compatibility - **File Handling**: tempfile for secure temporary file management ## ๐Ÿ“ˆ Current Features - **๐Ÿ”„ Dual Input Support**: Upload local CSV files or provide public URLs for data loading - **๐Ÿค– One-Click AutoML**: Complete ML pipeline from data upload to trained model export - **๐ŸŽฏ Intelligent Task Detection**: Automatic classification vs regression detection based on target variable analysis - **๐Ÿ“Š Multi-Algorithm Comparison**: Simultaneous comparison of 20+ algorithms with LazyPredict - **๐Ÿ“‹ Comprehensive EDA**: Detailed dataset profiling with statistical analysis and data quality reports - **๐Ÿ’พ Model Export**: Download best performing model as pickle file for production deployment - **๐Ÿ“ˆ Performance Visualization**: Clear charts showing algorithm comparison and performance metrics - **๐ŸŒ MCP Server Integration**: Full Model Context Protocol support for seamless AI assistant integration - **๐Ÿ›ก๏ธ Robust Error Handling**: Comprehensive validation with informative user feedback - **๐ŸŽจ Modern UI**: Clean, responsive interface optimized for both human and agent interactions ## ๐ŸŽฏ Hackathon Submission Highlights 1. **๐Ÿค– LazyPredict Integration**: Automated comparison of 20+ ML algorithms with minimal configuration 2. **๐Ÿง  Smart Automation**: Intelligent task detection, data validation, and model selection 3. **๐Ÿ“Š Comprehensive Analysis**: ydata-profiling powered EDA reports with statistical insights 4. **๐Ÿ‘ฅ Dual Interface Design**: Optimized for both human users and AI agent interactions 5. **๐ŸŒ MCP Server Implementation**: Full Model Context Protocol integration for seamless agent workflows 6. **๐Ÿ”„ Flexible Data Loading**: Support for both local uploads and URL-based data sources 7. **๐Ÿ“ˆ Production Ready**: Exportable models, comprehensive documentation, and robust error handling 8. **๐ŸŽจ Modern UI/UX**: Clean Gradio interface with intuitive workflow and clear feedback systems ## ๐Ÿ“ฆ Project Structure ``` MCP_Project/ โ”œโ”€โ”€ updated_ML.py # Primary application file (recommended) โ”œโ”€โ”€ fixed_ML_MCP_backup.py # Backup version with enhanced MCP features โ”œโ”€โ”€ requirements.txt # Python dependencies โ”œโ”€โ”€ pyproject.toml # Project configuration โ”œโ”€โ”€ uv.lock # UV dependency lockfile โ”œโ”€โ”€ README.md # This documentation โ”œโ”€โ”€ sample_house_prices.csv # Demo dataset for regression โ”œโ”€โ”€ sample_loan_approval.csv # Demo dataset for classification โ”œโ”€โ”€ collegePlace.csv # Demo dataset for placement analysis โ”œโ”€โ”€ model_plot.png # Sample visualization output โ””โ”€โ”€ __pycache__/ # Python cache files ``` ### Application Files Overview - **`updated_ML.py`**: The main application file with clean, streamlined code structure. Recommended for most users. - **`fixed_ML_MCP_backup.py`**: Alternative version with additional MCP server configurations and enhanced features. Both files provide identical core functionality with slight variations in configuration and additional features. ## ๐Ÿ“ง Contact & Support Built with โค๏ธ for the **Agents & MCP Hackathon 2025** This project demonstrates the power of combining LazyPredict's automated machine learning capabilities with the Model Context Protocol to create an intelligent, easy-to-use ML platform that seamlessly integrates into AI assistant workflows and provides production-ready machine learning solutions. ### ๐Ÿ”ฎ Features in Development - ๐Ÿง  LLM-powered model explanations and insights - โš™๏ธ Advanced feature engineering and preprocessing pipelines - ๐ŸŽฏ Ensemble model creation and stacking capabilities - ๐Ÿš€ Real-time prediction API endpoints - ๐Ÿ› ๏ธ Enhanced MCP tool suite with additional ML operations - ๐Ÿ“Š Interactive model interpretation and SHAP value analysis ### ๐ŸŽฎ Usage Tips & Best Practices #### Getting Started - **Choose Your File**: Use `updated_ML.py` for standard usage, `fixed_ML_MCP_backup.py` for advanced MCP features - **Target Column**: Ensure your target column name is exactly as it appears in the dataset (case-sensitive) - **Data Sources**: Both local CSV uploads and public URLs are supported seamlessly #### Data Loading Best Practices - **URL Loading**: Use direct links to CSV files (GitHub raw URLs work great!) - **File Size**: No strict limitations, but larger files may take longer to process - **Data Quality**: The system handles missing values automatically, but clean data yields better results #### Model Performance - **Classification**: System uses Accuracy as the primary metric for model selection - **Regression**: System uses R-Squared as the primary metric for model selection - **File Formats**: Currently supports CSV format with automatic delimiter detection - **Column Types**: Handles both numeric and categorical features automatically #### Troubleshooting - **Target Not Found**: Double-check column name spelling and case sensitivity - **URL Issues**: Ensure URLs point directly to CSV files (not web pages) - **Performance**: For large datasets, expect processing times of 2-5 minutes --- **Ready to experience automated machine learning? Upload your dataset or provide a URL and let LazyPredict find the best algorithm for your problem!** ๐Ÿš€ *Transform your data into insights with just a few clicks - no ML expertise required!*

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/daniel-was-taken/MCP_Project'

If you have feedback or need assistance with the MCP directory API, please join our Discord server