train_anomaly_model
Train an anomaly detection model using healthy vibration data to identify machinery faults. Supports unsupervised and semi-supervised modes with optional hyperparameter tuning.
Instructions
Train ML-based anomaly detection model on healthy data (UNSUPERVISED/SEMI-SUPERVISED).
Complete pipeline:
1. Extract features from healthy signals (segmentation + time-domain features)
2. Standardize features (StandardScaler - fitted on training data only)
3. Dimensionality reduction (PCA with specified variance explained)
4. Train novelty detection model (OneClassSVM or LocalOutlierFactor) on HEALTHY DATA ONLY
5. Optional hyperparameter tuning using validation data (semi-supervised)
6. Save model, scaler, and PCA transformer
**Training Mode:**
- UNSUPERVISED: Train only on healthy data with automatic hyperparameters
- SEMI-SUPERVISED: Train on healthy data, tune hyperparameters using validation set (healthy + fault)
**Note:** This is NOT supervised learning. OneClassSVM/LOF are trained ONLY on healthy data.
Fault data (if provided) is used ONLY for hyperparameter tuning after training.
**Validation Strategy:**
- If healthy_validation_files provided: Use those explicitly (no split)
- If healthy_validation_files NOT provided: Automatic 80/20 split of training data
- If fault_signal_files provided: Enable semi-supervised mode (hyperparameter tuning)
Args:
healthy_signal_files: List of CSV files with healthy machine data (for training)
sampling_rate: Sampling frequency in Hz (auto-detect from metadata if None)
segment_duration: Segment duration in seconds (default: 0.1)
overlap_ratio: Overlap ratio 0-1 (default: 0.5)
model_type: 'OneClassSVM' or 'LocalOutlierFactor' (default: 'OneClassSVM')
pca_variance: Cumulative variance to explain with PCA (default: 0.95)
fault_signal_files: Optional list of fault signals for HYPERPARAMETER TUNING (semi-supervised)
healthy_validation_files: Optional list of healthy signals for validation (specificity check).
If not provided, 20% of training data will be used.
model_name: Name for saved model files (default: 'anomaly_model')
ctx: MCP context for progress/logging
Returns:
AnomalyModelResult with model paths and performance metrics
Input Schema
| Name | Required | Description | Default |
|---|---|---|---|
| healthy_signal_files | Yes | ||
| sampling_rate | No | ||
| segment_duration | No | ||
| overlap_ratio | No | ||
| model_type | No | OneClassSVM | |
| pca_variance | No | ||
| fault_signal_files | No | ||
| healthy_validation_files | No | ||
| model_name | No | anomaly_model |
Output Schema
| Name | Required | Description | Default |
|---|---|---|---|
| model_type | Yes | Type of model: 'OneClassSVM' or 'LocalOutlierFactor' | |
| num_training_samples | Yes | Number of healthy samples used for training | |
| num_features_original | Yes | Number of original features | |
| num_features_pca | Yes | Number of PCA components (features after dimensionality reduction) | |
| variance_explained | Yes | Cumulative variance explained by PCA components | |
| model_params | Yes | Best model hyperparameters | |
| model_path | Yes | Path to saved model file (.pkl) | |
| scaler_path | Yes | Path to saved scaler file (.pkl) | |
| pca_path | Yes | Path to saved PCA file (.pkl) | |
| validation_accuracy | No | Overall balanced accuracy on healthy + fault validation data | |
| validation_details | No | Validation details with healthy and fault metrics | |
| validation_metrics | No | Detailed validation metrics (healthy/fault accuracy breakdown) |