Uses Gradio to create an interactive web interface for machine learning tasks, with visualizations and guided workflows for data analysis and model training
Designed to be deployed on Hugging Face Spaces, enabling sharing and accessing the ML training platform through Hugging Face's infrastructure
Uses NumPy for numerical operations and data processing in the machine learning workflow
Leverages pandas for dataset handling, preprocessing, and feature engineering as part of the ML pipeline
Utilizes Plotly for creating interactive data visualizations including prediction scatter plots, feature importance charts, and performance metric dashboards
Built with Python as the core programming language, with the MCP server run through Python commands
Implements various machine learning algorithms from scikit-learn including regression and classification models as part of the ML training platform
title: AutoML - MCP Hackathon emoji: 🤖 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 4.0.0 app_file: updated_ML.py pinned: false license: mit tags: - machine-learning - mcp - hackathon - automl - lazypredict - gradio - mcp-server-track - agent-demo-track short_description: Automated ML model comparison with LazyPredict and MCP integration
🤖 AutoML - MCP Hackathon Submission
Automated Machine Learning Platform with LazyPredict and Model Context Protocol Integration
🏆 Hackathon Track
Agents & MCP Hackathon - Track 1: MCP Tool / Server
🌟 Key Features
Core ML Capabilities
- 📤 Dual Data Input: Support for both local CSV file uploads and public URL data sources
- 🎯 Auto Problem Detection: Automatically determines regression vs classification tasks
- 🤖 Multi-Algorithm Comparison: LazyPredict-powered comparison of 20+ ML algorithms
- 📊 Automated EDA: Comprehensive dataset profiling with ydata-profiling
- 💾 Best Model Export: Download top-performing model as pickle file
- 📈 Performance Visualization: Interactive charts showing model comparison results
🚀 Advanced Features
- 🌐 URL Data Loading: Direct data loading from public CSV URLs with robust error handling
- 🔄 Agent-Friendly Interface: Designed for both human users and AI agent interactions
- 📊 Interactive Dashboards: Real-time model performance metrics and visualizations
- 🔍 Smart Error Handling: Comprehensive validation and user feedback system
- 💻 MCP Server Integration: Full Model Context Protocol server implementation
🛠️ How It Works
The AutoML provides a streamlined pipeline for automated machine learning:
Core Functions
load_data(file_input)
- Universal data loader that handles:- Local CSV file uploads through Gradio's file component
- Public CSV URLs with HTTP/HTTPS support
- Robust error handling and validation
- Automatic format detection and parsing
analyze_and_model(df, target_column)
- Core ML pipeline that:- Generates comprehensive EDA reports using ydata-profiling
- Automatically detects task type (classification vs regression) based on target variable uniqueness
- Trains and evaluates multiple models using LazyPredict
- Selects the best performing model based on appropriate metrics
- Creates publication-ready visualizations comparing model performance
- Exports the best model as a serialized pickle file
run_pipeline(data_source, target_column)
- Main orchestration function:- Validates all inputs and provides clear error messages
- Coordinates the entire ML workflow from data loading to model export
- Generates AI-powered explanations of results
- Returns all outputs in a format optimized for both UI and API consumption
Agent-Friendly Design
- Single Entry Point: The
run_pipeline()
function serves as the primary interface for AI agents - Flexible Input Handling: Automatically determines whether input is a file path or URL
- Comprehensive Output: Returns all generated artifacts (models, reports, visualizations)
- Error Resilience: Robust error handling with informative feedback
🚀 Quick Start
📋 Application File Comparison
Feature | updated_ML.py | fixed_ML_MCP_backup.py |
---|---|---|
Core ML Pipeline | ✅ Full AutoML functionality | ✅ Full AutoML functionality |
MCP Server | ✅ Enabled | ✅ Enhanced configuration |
UI Interface | ✅ Clean, streamlined | ✅ Identical interface |
Code Structure | ✅ Primary, well-documented | ✅ Backup with additional features |
Recommended For | General use, development | Advanced MCP integration |
Running the Application
The project includes two main application files:
Primary Application: updated_ML.py
(Recommended)
Backup Version: fixed_ML_MCP_backup.py
Web Interface
- Choose Data Source:
- Local Upload: Use the file upload component to select a CSV file from your computer
- URL Input: Enter a public CSV URL (e.g., from GitHub, data repositories, or cloud storage)
- Specify Target: Enter the exact name of your target column (case-sensitive)
- Run Analysis: Click "Run Analysis & AutoML" to start the AutoML pipeline
- Review Results:
- View detected task type (classification/regression)
- Examine model performance metrics in the interactive table
- Download comprehensive EDA report (HTML format)
- Download the best performing model (pickle format)
- View model comparison visualization
Installation & Setup
Server Configuration
The application launches with the following settings:
- Host:
0.0.0.0
(accessible from any network interface) - Port:
7860
(default Gradio port) - MCP Server: Enabled for AI agent integration
- API Documentation: Available at
/docs
endpoint - Browser Launch: Automatic browser opening enabled
🎯 Current Implementation
1. LazyPredict Integration
- Automated Model Training: Trains 20+ algorithms automatically
- Performance Comparison: Side-by-side evaluation of all models
- Best Model Selection: Automatically selects top performer based on accuracy/R² score
2. Comprehensive EDA
- ydata-profiling: Generates detailed dataset analysis reports
- Automatic Insights: Data quality, distributions, correlations, and missing values
- Interactive Reports: Downloadable HTML reports with comprehensive statistics
3. Smart Task Detection
- Classification: Automatically detected when target has ≤10 unique values
- Regression: Automatically detected for continuous target variables
- Adaptive Metrics: Uses appropriate evaluation metrics for each task type
4. Model Persistence
- Pickle Export: Save trained models for future use
- Model Reuse: Load and apply models to new datasets
- Production Ready: Serialized models ready for deployment
📊 Supported Algorithms (via LazyPredict)
Classification Algorithms
- Logistic Regression, Decision Tree Classifier
- Random Forest Classifier, Extra Trees Classifier
- Gradient Boosting Classifier, AdaBoost Classifier
- XGBoost Classifier, LightGBM Classifier
- SVM Classifier, K-Nearest Neighbors
- Naive Bayes, Linear Discriminant Analysis
- Quadratic Discriminant Analysis, and more...
Regression Algorithms
- Linear Regression, Ridge Regression, Lasso Regression
- Decision Tree Regressor, Random Forest Regressor
- Extra Trees Regressor, Gradient Boosting Regressor
- XGBoost Regressor, LightGBM Regressor
- Support Vector Regression, K-Nearest Neighbors
- AdaBoost Regressor, Elastic Net, and more...
🏆 Demo Scenarios
House Price Prediction (Regression)
- Upload
sample_house_prices.csv
included in the project - Enter
price
as the target column name - System automatically detects regression task
- Compare performance of 15+ regression algorithms
- Download the best performing model and detailed EDA report
Loan Approval Prediction (Classification)
- Upload
sample_loan_approval.csv
included in the project - Enter the loan approval status column name as target
- System automatically detects classification task
- Compare accuracy of 15+ classification algorithms
- Get comprehensive EDA report with approval insights
College Placement Analysis
- Upload
collegePlace.csv
included in the project - Analyze student placement outcomes
- Automatic feature analysis and model comparison
- Export trained model for future predictions
URL-Based Data Analysis
- Use public dataset URLs for instant analysis
- Example: Government open data, research datasets, cloud-hosted files
- No file size limitations with URL-based loading
- Seamless integration with cloud storage platforms
🚀 Technologies Used
- Frontend: Gradio 4.0+ with soft theme and MCP server integration
- AutoML Engine: LazyPredict for automated model comparison and evaluation
- EDA Framework: ydata-profiling for comprehensive dataset analysis and reporting
- ML Libraries: scikit-learn, XGBoost, LightGBM (via LazyPredict ecosystem)
- Visualization: Matplotlib and Seaborn for model comparison charts and statistical plots
- Data Processing: pandas and numpy for efficient data manipulation and preprocessing
- Model Persistence: pickle for secure model serialization and export
- Web Requests: requests library for robust URL-based data loading
- MCP Integration: Model Context Protocol server for AI agent compatibility
- File Handling: tempfile for secure temporary file management
📈 Current Features
- 🔄 Dual Input Support: Upload local CSV files or provide public URLs for data loading
- 🤖 One-Click AutoML: Complete ML pipeline from data upload to trained model export
- 🎯 Intelligent Task Detection: Automatic classification vs regression detection based on target variable analysis
- 📊 Multi-Algorithm Comparison: Simultaneous comparison of 20+ algorithms with LazyPredict
- 📋 Comprehensive EDA: Detailed dataset profiling with statistical analysis and data quality reports
- 💾 Model Export: Download best performing model as pickle file for production deployment
- 📈 Performance Visualization: Clear charts showing algorithm comparison and performance metrics
- 🌐 MCP Server Integration: Full Model Context Protocol support for seamless AI assistant integration
- 🛡️ Robust Error Handling: Comprehensive validation with informative user feedback
- 🎨 Modern UI: Clean, responsive interface optimized for both human and agent interactions
🎯 Hackathon Submission Highlights
- 🤖 LazyPredict Integration: Automated comparison of 20+ ML algorithms with minimal configuration
- 🧠 Smart Automation: Intelligent task detection, data validation, and model selection
- 📊 Comprehensive Analysis: ydata-profiling powered EDA reports with statistical insights
- 👥 Dual Interface Design: Optimized for both human users and AI agent interactions
- 🌐 MCP Server Implementation: Full Model Context Protocol integration for seamless agent workflows
- 🔄 Flexible Data Loading: Support for both local uploads and URL-based data sources
- 📈 Production Ready: Exportable models, comprehensive documentation, and robust error handling
- 🎨 Modern UI/UX: Clean Gradio interface with intuitive workflow and clear feedback systems
📦 Project Structure
Application Files Overview
updated_ML.py
: The main application file with clean, streamlined code structure. Recommended for most users.fixed_ML_MCP_backup.py
: Alternative version with additional MCP server configurations and enhanced features.
Both files provide identical core functionality with slight variations in configuration and additional features.
📧 Contact & Support
Built with ❤️ for the Agents & MCP Hackathon 2025
This project demonstrates the power of combining LazyPredict's automated machine learning capabilities with the Model Context Protocol to create an intelligent, easy-to-use ML platform that seamlessly integrates into AI assistant workflows and provides production-ready machine learning solutions.
🔮 Features in Development
- 🧠 LLM-powered model explanations and insights
- ⚙️ Advanced feature engineering and preprocessing pipelines
- 🎯 Ensemble model creation and stacking capabilities
- 🚀 Real-time prediction API endpoints
- 🛠️ Enhanced MCP tool suite with additional ML operations
- 📊 Interactive model interpretation and SHAP value analysis
🎮 Usage Tips & Best Practices
Getting Started
- Choose Your File: Use
updated_ML.py
for standard usage,fixed_ML_MCP_backup.py
for advanced MCP features - Target Column: Ensure your target column name is exactly as it appears in the dataset (case-sensitive)
- Data Sources: Both local CSV uploads and public URLs are supported seamlessly
Data Loading Best Practices
- URL Loading: Use direct links to CSV files (GitHub raw URLs work great!)
- File Size: No strict limitations, but larger files may take longer to process
- Data Quality: The system handles missing values automatically, but clean data yields better results
Model Performance
- Classification: System uses Accuracy as the primary metric for model selection
- Regression: System uses R-Squared as the primary metric for model selection
- File Formats: Currently supports CSV format with automatic delimiter detection
- Column Types: Handles both numeric and categorical features automatically
Troubleshooting
- Target Not Found: Double-check column name spelling and case sensitivity
- URL Issues: Ensure URLs point directly to CSV files (not web pages)
- Performance: For large datasets, expect processing times of 2-5 minutes
Ready to experience automated machine learning? Upload your dataset or provide a URL and let LazyPredict find the best algorithm for your problem! 🚀
Transform your data into insights with just a few clicks - no ML expertise required!
This server cannot be installed
hybrid server
The server is able to function both locally and remotely, depending on the configuration or use case.
Advanced machine learning platform with MCP integration that enables automated ML workflows from data analysis to model deployment, featuring smart preprocessing, 15+ ML algorithms, and interactive visualizations.
Related MCP Servers
- -securityFlicense-qualityProvides SEO automation with tools for keyword research, SERP analysis, and competitor analysis through Google Ads API integration, enabling AI assistants to access these capabilities via MCP.Last updated -4JavaScript
- AsecurityAlicenseAqualityAn open-source MCP server that connects to various data sources (SQL databases, CSV, Parquet files), allowing AI models to execute SQL queries and generate data visualizations for analytics and business intelligence.Last updated -1044PythonMIT License
- PythonMIT License
- -securityAlicense-qualityA collection of custom MCP servers providing various AI-powered capabilities including web search, YouTube video analysis, GitHub repository analysis, reasoning, code generation/execution, and web crawling.Last updated -2PythonMIT License