Classification of EEG signals using Transformer based deep learning and ensemble models
Classification of EEG signals using Transformer based deep learning and ensemble models
- Research Article
1937
- 10.1016/j.engappai.2022.105151
- Jul 30, 2022
- Engineering Applications of Artificial Intelligence
Ensemble deep learning: A review
- Dissertation
- 10.32657/10356/182221
- Jan 1, 2025
Deep learning has become increasingly popular due to its remarkable ability to learn high-dimensional feature representations. Numerous algorithms and models have been developed to enhance the application of deep learning across various real-world tasks, including image classification, natural language processing, and autonomous driving. However, deep learning models are susceptible to backdoor threats, where an attacker manipulates the training process or data to cause incorrect predictions on malicious samples containing specific triggers, while maintaining normal performance on benign samples. With the advancement of deep learning, including evolving training schemes and the need for large-scale training data, new threats in the backdoor domain continue to emerge. Conversely, backdoors can also be leveraged to protect deep learning models, such as through watermarking techniques. In this thesis, we conduct an in-depth investigation into backdoor techniques from three novel perspectives. In the first part of this thesis, we demonstrate that emerging deep learning training schemes can introduce new backdoor risks. Specifically, pre-trained Natural Language Processing (NLP) models can be easily adapted to a variety of downstream language tasks, significantly accelerating the development of language models. However, the pre-trained model becomes a single point of failure for these downstream models. We propose a novel task-agnostic backdoor attack against pre-trained NLP models, wherein the adversary does not need prior information about the downstream tasks when implanting the backdoor into the pre-trained model. Any downstream models transferred from this malicious model will inherit the backdoor, even after extensive transfer learning, revealing the severe vulnerability of pre-trained foundation models to backdoor attacks. In the second part of this thesis, we develop novel backdoor attack methods suited to new threat scenarios. The rapid expansion of deep learning models necessitates large-scale training data, much of which is unlabeled and outsourced to third parties for annotation. To ensure data security, most datasets are read-only for training samples, preventing the addition of input triggers. Consequently, attackers can only achieve data poisoning by uploading malicious annotations. In this practical scenario, all existing data poisoning methods that add triggers to the input are infeasible. Therefore, we propose new backdoor attack methods that involve poisoning only the labels without modifying any input samples. In the third part of this thesis, we utilize the backdoor technique to proactively protect our deep learning models, specifically for intellectual property protection. Considering the complexity of deep learning tasks, generating a well-trained deep learning model requires substantial computational resources, training data, and expertise. Therefore, it is essential to protect these assets and prevent copyright infringement. Inspired by backdoor attacks that can induce specific behaviors in target models through carefully designed samples, several watermarking methods have been proposed to protect the intellectual property of deep learning models. Model owners can train their models to produce unique outputs for certain crafted samples and use these samples for ownership verification. While various extraction techniques have been designed for supervised deep learning models, challenges arise when applying them to deep reinforcement learning models due to differences in model features and scenarios. Therefore, we propose a novel watermarking scheme to protect deep reinforcement learning models from unauthorized distribution. Instead of using spatial watermarks as in conventional deep learning models, we design temporal watermarks that minimize potential impact and damage to the protected deep reinforcement learning model while achieving high-fidelity ownership verification. In summary, this thesis investigates the evolving landscape of backdoor threats during the development of deep learning techniques and the use of backdoors for beneficial purposes in intellectual property protection.
- Research Article
4
- 10.1038/s41598-025-98518-7
- Apr 25, 2025
- Scientific Reports
The most common causes of spine fractures, or vertebral column fractures (VCF), are traumas like falls, injuries from sports, or accidents. CT scans are affordable and effective at detecting VCF types in an accurate manner. VCF type identification in cervical, thoracic, and lumbar (C3-L5) regions is limited and sensitive to inter-observer variability. To solve this problem, this work introduces an autonomous approach for identifying VCF type by developing a novel ensemble model of Vision Transformers (ViT) and best-performing deep learning (DL) models. It assists orthopaedicians in easy and early identification of VCF types. The performance of numerous fine-tuned DL architectures, including VGG16, ResNet50, and DenseNet121, was investigated, and an ensemble classification model was developed to identify the best-performing combination of DL models. A ViT model is also trained to identify VCF. Later, the best-performing DL models and ViT were fused by weighted average technique for type identification. To overcome data limitations, an extended Deep Convolutional Generative Adversarial Network (DCGAN) and Progressive Growing Generative Adversarial Network (PGGAN) were developed. The VGG16-ResNet50-ViT ensemble model outperformed all ensemble models and got an accuracy of 89.98%. Extended DCGAN and PGGAN augmentation increased the accuracy of type identification to 90.28% and 93.68%, respectively. This demonstrates efficacy of PGGANs in augmenting VCF images. The study emphasizes the distinctive contributions of the ResNet50, VGG16, and ViT models in feature extraction, generalization, and global shape-based pattern capturing in VCF type identification. CT scans collected from a tertiary care hospital are used to validate these models.
- Research Article
- 10.1186/s13244-026-02220-9
- Mar 3, 2026
- Insights into imaging
Radiologists often face challenges in differentiating benign from malignant sacral bone lesions due to their similar imaging characteristics. This study aimed to develop an ensemble deep learning (DL) model that can preoperatively distinguish between benign and malignant sacral tumors using noncontrast computed tomography images. Preoperative sacral CT scans from 569 patients with confirmed sacral lesions were analyzed. Data from Center 1 were utilized in model development and internal test via fivefold cross-validation, and those from Centers 2 and 3 were employed in external test. Various ensemble models combining human-readable interpretation and DL were developed. The diagnostic performance of the models and radiologists was assessed using metrics such as precision, recall, accuracy, area under the curve (AUC), F1 score, and confusion matrix. Furthermore, the clinical benefits derived from radiologists' interpretations and supported by the DL model were evaluated. The ensemble model, which integrates 3D-DenseNet121 with human interpretation, exhibited the most robust performance. The ensemble model demonstrated high performance on the internal and external test sets and achieved AUCs of 0.9139 and 0.8713, F1 scores of 0.9054 and 0.8571, precision of 0.9041 and 0.8824, recall of 0.9136 and 0.8333, and accuracy of 0.8630 and 0.8182, respectively. Across the external test cohort, all radiologists experienced improvements in AUC, accuracy, sensitivity, and specificity. Notably, junior radiologists demonstrated significant improvements compared with senior radiologists. The potential clinical application of the DL model lies in its capacity to considerably enhance the diagnostic efficiency of radiologists. This study presents the first ensemble deep learning model integrating 3D-DenseNet121 with radiologists' interpretation for preoperative differentiation of sacral tumors on noncontrast CT that improved diagnostic performance across all experience levels, particularly for junior radiologists. First artificial intelligence-radiologist ensemble for noncontrast computed tomography (NCCT)-based sacral tumor classification. Boosts all radiologists' performance, with the greatest gains for juniors, potentially reducing referrals. Enables reliable NCCT diagnosis, overcoming contrast/magnetic resonance imaging dependency in musculoskeletal oncology.
- Research Article
- 10.1158/1538-7445.am2021-184
- Jul 1, 2021
- Cancer Research
Purpose: Although deep learning (DL) models have shown increasing ability to accurately classify diagnostic images in oncology, significantly large amounts of well-curated data are often needed to match human level performance. Given the relative paucity of imaging datasets for less prevalent cancer types, there is an increasing need for methods which can improve the performance of deep learning models trained using limited diagnostic images. Deep metric learning (DML) is a potential method which can improve accuracy in deep learning models trained on limited datasets. Leveraging a triplet-loss function, DML exponentially increases training data compared to a traditional DL model. In this study, we investigated the utility of DML to improve the accuracy of DL models trained to classify cancerous lesions found on screening mammograms. Methods: Using a dataset of 2620 lesions found on routine screening mammogram, we trained both a traditional DL and DML models to classify suspicious lesions as cancerous or benign. The VGG16 architecture was used as the basis for the DL and DML models. Model performance was compared by calculating model accuracy, sensitivity, and specificity on a blinded test set of 378 lesions. In addition to individual model performance, we also measured agreement accuracy when both the DL and DML models were combined. Sub-analyses were conducted to identify phenotypes which were best suited for each model type. Both models underwent hyperparameters optimization to identify ideal batch size, learning rate, and regularization to prevent overfitting. Results: We found that the combination of the traditional DL model with DML model resulted in the highest overall accuracy (78.7%) representing a 7.1% improvement compared to the traditional DL model (p<.001). Alone, the traditional DL model had an improved accuracy compared to the DML model (71.4% vs 66.4%). The traditional DL model had a higher sensitivity (94.8% vs 73.6 %) , but lower specificity (34.7% vs 55.1%) compared the DML model. Sub-analyses suggested the traditional DL model was more accurate on higher density breasts, whereas the DML model was more accurate on lower density breasts. Additionally, the traditional DL model had the highest accuracy on oval shaped lesions, compared to the DML model which was most accurate on irregularly shaped breast lesions. Conclusion: Our study suggests that addition of DML models with traditional DL models can improve diagnostic image classification performance in cancer. Our results suggest DML models may provide increased specificity and help with classification of unique populations often misclassified by traditional DL models. Further studied investigating the utility of DML on other cancer imaging tasks are necessary to successfully build more robust DL models in cancer imaging. Citation Format: Justin Du, Sachin Umrao, Enoch Chang, Marina Joel, Aidan Gilson, Guneet Janda, Rachel Choi, Yongfeng Hui, Sanjay Aneja. The utility of deep metric learning for breast cancer identification on mammographic images [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 184.
- Research Article
1
- 10.3390/electronics13101996
- May 20, 2024
- Electronics
Electricity load forecasting is a crucial undertaking within all the deregulated markets globally. Among the research challenges on a global scale, the investigation of deep transfer learning (DTL) in the field of electricity load forecasting represents a fundamental effort that can inform artificial intelligence applications in general. In this paper, a comprehensive study is reported regarding day-ahead electricity load forecasting. For this purpose, three sequence-to-sequence (Seq2seq) deep learning (DL) models are used, namely the multilayer perceptron (MLP), the convolutional neural network (CNN) and the ensemble learning model (ELM), which consists of the weighted combination of the outputs of MLP and CNN models. Also, the study focuses on the development of different forecasting strategies based on DTL, emphasizing the way the datasets are trained and fine-tuned for higher forecasting accuracy. In order to implement the forecasting strategies using deep learning models, load datasets from three Greek islands, Rhodes, Lesvos, and Chios, are used. The main purpose is to apply DTL for day-ahead predictions (1–24 h) for each month of the year for the Chios dataset after training and fine-tuning the models using the datasets of the three islands in various combinations. Four DTL strategies are illustrated. In the first strategy (DTL Case 1), each of the three DL models is trained using only the Lesvos dataset, while fine-tuning is performed on the dataset of Chios island, in order to create day-ahead predictions for the Chios load. In the second strategy (DTL Case 2), data from both Lesvos and Rhodes concurrently are used for the DL model training period, and fine-tuning is performed on the data from Chios. The third DTL strategy (DTL Case 3) involves the training of the DL models using the Lesvos dataset, and the testing period is performed directly on the Chios dataset without fine-tuning. The fourth strategy is a multi-task deep learning (MTDL) approach, which has been extensively studied in recent years. In MTDL, the three DL models are trained simultaneously on all three datasets and the final predictions are made on the unknown part of the dataset of Chios. The results obtained demonstrate that DTL can be applied with high efficiency for day-ahead load forecasting. Specifically, DTL Case 1 and 2 outperformed MTDL in terms of load prediction accuracy. Regarding the DL models, all three exhibit very high prediction accuracy, especially in the two cases with fine-tuning. The ELM excels compared to the single models. More specifically, for conducting day-ahead predictions, it is concluded that the MLP model presents the best monthly forecasts with MAPE values of 6.24% and 6.01% for the first two cases, the CNN model presents the best monthly forecasts with MAPE values of 5.57% and 5.60%, respectively, and the ELM model achieves the best monthly forecasts with MAPE values of 5.29% and 5.31%, respectively, indicating the very high accuracy it can achieve.
- Research Article
19
- 10.1371/journal.pone.0282608
- Mar 9, 2023
- PLOS ONE
COVID-19 is highly infectious and causes acute respiratory disease. Machine learning (ML) and deep learning (DL) models are vital in detecting disease from computerized chest tomography (CT) scans. The DL models outperformed the ML models. For COVID-19 detection from CT scan images, DL models are used as end-to-end models. Thus, the performance of the model is evaluated for the quality of the extracted feature and classification accuracy. There are four contributions included in this work. First, this research is motivated by studying the quality of the extracted feature from the DL by feeding these extracted to an ML model. In other words, we proposed comparing the end-to-end DL model performance against the approach of using DL for feature extraction and ML for the classification of COVID-19 CT scan images. Second, we proposed studying the effect of fusing extracted features from image descriptors, e.g., Scale-Invariant Feature Transform (SIFT), with extracted features from DL models. Third, we proposed a new Convolutional Neural Network (CNN) to be trained from scratch and then compared to the deep transfer learning on the same classification problem. Finally, we studied the performance gap between classic ML models against ensemble learning models. The proposed framework is evaluated using a CT dataset, where the obtained results are evaluated using five different metrics The obtained results revealed that using the proposed CNN model is better than using the well-known DL model for the purpose of feature extraction. Moreover, using a DL model for feature extraction and an ML model for the classification task achieved better results in comparison to using an end-to-end DL model for detecting COVID-19 CT scan images. Of note, the accuracy rate of the former method improved by using ensemble learning models instead of the classic ML models. The proposed method achieved the best accuracy rate of 99.39%.
- Research Article
- 10.54254/2755-2721/101/20241055
- Nov 8, 2024
- Applied and Computational Engineering
Deep learning and Transformer models have revolutionized medical diagnostics, particularly in the advanced analysis of large datasets. Given the individual variability in cognitive decline status, the accuracy of diagnosing elderly cognitive decline diseases is crucial. Since it allows for the initiation of treatment and interventions at the earliest possible stage, potentially slowing the progression of the disease. In this paper, the author reviews and summarizes studies on the use of Transformers and deep learning in Electroencephalography (EEG) signal decoding, which is the most prevalent signal-detecting method in Brain-Computer Interface (BCI) systems for elderly cognitive decline. The review indicates that deep learning and transformer models outperform traditional methods in classifying cognitive decline, offering effective methods and improved accuracy by extracting and analyzing in-depth features from EEG data. Differently, deep learning models are proficient in single modality learning, but transformer models offer a unified approach to process and integrate information from diverse data sources and, with the increase in data volume, Transformers have the potential to further improve the accuracy of diagnosis.
- Conference Article
4
- 10.1109/spw.2019.00050
- May 1, 2019
Network security represents a keystone to ISPs, who need to cope with an increasing number of network attacks that put the network's integrity at risk. The high-dimensionality of network data provided by current network monitoring systems opens the door to the massive application of Machine Learning (ML) approaches to improve the detection and classification of network attacks. In recent years, machine learning-based systems have gained popularity for network security applications, usually considering the application of shallow models, where a set of expert handcrafted features are needed to pre-process the data before training. Deep Learning (DL) models can alleviate the need of domain expert knowledge by relying on their ability to learn feature representations from input raw or basic, non-processed data. Still, it is not clear today which is the best model or best model-category to manage network security, as in general, only adhoc and tailored approaches have been proposed and evaluated so far. In this paper we train and benchmark different ML models for detection of network attacks in different real network data. We consider an extensive battery of supervised ML models, including both shallow and deep models, taking as input either pre-computed domain-knowledge based input features, or raw, byte-stream inputs. Proposed models are evaluated either using real, in the wild network measurements coming from the WIDE backbone network – the well-known MAWILab dataset, and through publicly available datasets. Results suggest that deep learning models can provide similar results to the best-performing shallow models, but without any sort of expert handcrafted inputs.
- Conference Article
5
- 10.1109/icai58407.2023.10136675
- Feb 22, 2023
Poet attribution focuses on determining ownership of a piece of poetry by insights obtained from analyzing his existing poetry. Its significance is immense including in detection of plagiarism and characterization of poetry of a poet. Urdu, Pakistan's lingua franca with the richest poetic tradition, has been a subject of misinformation and misattribution. This paper presents a novel approach to poet attribution in Urdu Ghazals through the application of machine and deep learning models. Our aim is to establish an accurate and comprehensive characterization of ghazals that captures the unique writing style of each poet. To achieve this, we trained and tested a range of machine learning, deep learning, and transformer-based classification models on a dataset containing 17,609 couplets of 15 notable ghazal poets. We used classifiers such as SVM and logistic regression to obtain preliminary results, achieving an accuracy of 64% with SVM. However, to achieve even better results, we employed deep learning models such as MLP, CNNs, and GRUs, with LSTMs resulting in the highest accuracy of 59.96%. We then used transformer-based models, including roBERTa and BERT, which achieved an outstanding accuracy of approximately 80% in classifying 15 poets. This work represents a significant contribution to the field of computational poetry analysis, as it is the first to explore poet attribution in Urdu Ghazals using deep learning and transformer-based models. Our analytical approach enables us to examine and analyze each model's capabilities in capturing the writing style of Urdu Ghazal poets, leading to a more comprehensive and accurate characterization of these works.
- Research Article
- 10.1177/1088467x251377935
- Oct 16, 2025
- Intelligent Data Analysis: An International Journal
Brain Computer Interface (BCI) technology is presented for improving the quality of life for individuals with physical impairments. It is based on different physiological sensors, among which Electroencephalography (EEG) is exploited for capturing and interpreting brain activity. In spite of its benefits, traditional EEG based classification models suffer from high computational complexity and limited accuracy. Accurate classification of Motor Imagery (MI) EEG signals is major for developing robust and automated BCI systems. This work presents a Deep Learning (DL) model that integrates a Convolutional Neural Network (CNN) with a Multi-Scale Attention (MSA) network which provides better EEG signal classification. Initially, the Multiscale Principal Component Analysis (MSPCA) is exploited for pre-processing the noise signals. Then, the Beluga Whale Optimization (BWO) is presented for selecting optimal features. The proposed model considers a MSA-CNN, which combines parallel convolutional layers with varying kernel sizes and a Squeeze-and-Excitation (SE) based attention mechanism for extracting discriminative features. The suggested model is evaluated by the PhysioNet EEG MI dataset, with outcomes highlighting superior classification performance compared to existing methods and achieved better accuracies of 99.1% on PhysioNet and 99.02% on BCI Competition IV-2a. This hybrid model offered a scalable and efficient solution for real-time MI-EEG classification in BCI applications.
- Research Article
1
- 10.3390/jcm14207216
- Oct 13, 2025
- Journal of Clinical Medicine
Background/Objectives: The clinical management of adolescent idiopathic scoliosis (AIS) is hindered by the inability to accurately predict curve progression. Although skeletal maturity and the initial Cobb angle are established predictors of progression, their combined predictive accuracy remains limited. This study aimed to develop a robust and interpretable artificial intelligence (AI) system using deep learning (DL) models to predict the progression of scoliosis using only standing frontal radiographs. Methods: We conducted a multicenter study involving 542 patients with AIS. After excluding 52 borderline progression cases (6–9° progression in the Cobb angle), 294 and 196 patients were assigned to progression (≥10° increase) and non-progression (≤5° increase) groups, respectively, considering a 2-year follow-up. Frontal whole spinal radiographs were preprocessed using histogram equalization and divided into two regions of interest (ROIs) (ROI 1, skull base–femoral head; ROI 2, C7–iliac crest). Six pretrained DL models, including convolutional neural networks (CNNs) and transformer-based models, were trained on the radiograph images. Gradient-weighted class activation mapping (Grad-CAM) was further performed for model interpretation. Results: Ensemble models outperformed individual ones, with the average ensemble model achieving area under the curve (AUC) values of 0.769 for ROI 1 and 0.755 for ROI 2. Grad-CAM revealed that the CNNs tended to focus on the local curve apex, whereas the transformer-based models demonstrated global attention across the spine, ribs, and pelvis. Models trained on ROI 2 performed comparably with respect to those using ROI 1, supporting the feasibility of image standardization without a loss of accuracy. Conclusions: This study establishes the clinical potential of transformer-based DL models for predicting the progression of scoliosis using only plain radiographs. Our multicenter approach, high AUC values, and interpretable architectures support the integration of AI into clinical decision-making for the early treatment of AIS.
- Conference Article
1
- 10.1109/smc52423.2021.9659085
- Oct 17, 2021
Market price and yield forecasting models for Fresh Produce (FP) are crucial to protect retailers and consumers from overpriced FP. However, utilizing the data for forecasting is obstructed by the occurrence of missing values. Therefore, it is imperative to impute the encountered missing instances to enable effective forecasting. Most of the work found in literature tackles imputation of missing values when they are randomly scattered in the dataset while very little work is found tackling both: consecutive occurrence of missing data, i.e. missing data chunks, as well as those randomly missing. In this work, the data used for forecasting has missing values in chunks as well as at random points. Therefore, various comprehensive imputation models are used to impute both random as well as chunks of missing values. Since the imputed time series are incomplete, the only way to evaluate those imputation models is to analyze their effect on forecasting performance. The ensemble of two compound deep learning (DL) models, namely Attention Convolutional Neural Networks Long Short Term Memory (Att-CNN-LSTM) and SeriesNet with Gated Recurrent Unit (GRU), is used for forecasting. For imputation, three DL models are tested: The Ensemble imputation model which is a Voting Regressor of two DL submodels, Residual GRU and LSTM-Deep-GRU. Another deep learning imputation model is used which is a Transfer Learning (TL) model. Finally, a Hybrid model of both DL models is designed to take the pros of each of its integrated models by using the Ensemble model in case of random missing data and the Transfer Learning model in case of missing data chunks. It is observed that, in general, imputing the missing values improves the forecasting result as compared to eliminating the instances with missing values. The Hybrid model improves the overall forecasting performance by up to 60% compared to the case of using the second-best Transfer Learning model and around 64% as compared to the case of imputation using the Ensemble model.
- Research Article
- 10.1186/s12885-025-14971-7
- Oct 22, 2025
- BMC Cancer
ObjectiveMicrovascular invasion (MVI) is of great significance for the individualized treatment of hepatocellular carcinoma (HCC) and preoperative noninvasive prediction of MVI is still an urgent clinical problem. To explore the effects of different regions of interest (ROI) and image input dimensions on the performance of deep learning (DL) models, and to select the best result to develop and validate a DL model for preoperative prediction of MVI.Materials and methodsA total of 206 patients with pathologically confirmed HCC from three hospitals were retrospectively enrolled and divided into training, internal validation and external test set. Based on hepatobiliary phase images (HBP) of gadoxetic acid-enhanced MRI, 2D DL, 3D DL and 2.5D deep multi-instance learning (MIL) models were established. The receiver operating characteristic curve (ROC) was used to evaluate the predictive efficacy of the above models. Based on the optimal performance model, the T1WI-FS and T2WI-FS images were preprocessed correspondingly, and a multimodal prediction model including three sequences was constructed. The ROC, and decision curve were used to visualize the predictive ability of the model.ResultsCompared with 2D DL and 3D DL models, the 2.5D DL model based on all axial images of ROI had the highest performance, with the AUC values of 0.802 (95% CI, 0.669–0.936) and 0.759 (95% CI, 0.643–0.875) in the validation and test sets. The AUCs of the multimodal MRI model were 0.954 (95% CI, 0.920–0.989) in the training set, 0.857 (95% CI, 0.736–0.978) in the validation set, and 0.788 (95% CI, 0.681–0.895) in the test set.ConclusionThe DL model that selects all axial slices of intratumor and peritumor as input shows robust capability in predicting MVI, which is expected to help clinical decision-making of individualized treatment for HCC.Supplementary InformationThe online version contains supplementary material available at 10.1186/s12885-025-14971-7.
- Research Article
- 10.1186/s12944-025-02820-2
- Dec 20, 2025
- Lipids in Health and Disease
Cardiometabolic multimorbidity (CMM) has become an increasing global public health challenge. In China, the prevalence of CMM is rising rapidly among middle-aged and older adults, with estimates ranging from 11.6% to 16.9%, posing a substantial burden on both individuals and healthcare systems. However, effective tools for predicting individual risk of CMM remain limited, hindering timely prevention and intervention. This study used data from the China Health and Retirement Longitudinal Study (CHARLS) between 2011 and 2015, including 7,913 participants aged ≥ 45 years without CMM at baseline. Incident CMM events were identified during the 2015 follow-up based on self-reported diagnoses of cardiometabolic diseases. Ten lipid metabolism biomarkers and derived composite indices (TC, TG, LDL-C, HDL-C, TyG, TyG-BMI, LAP, CTI, non-HDL-C, and RC) were evaluated. Predictive models were estimated using logistic regression, random forest, gradient boosting machine, eXtreme Gradient Boosting (XGBoost), support vector machine, naïve Bayes, deep learning (DL), and an ensemble model. The dataset was randomly split into training (75%) and validation (25%) subsets. Model discrimination was assessed using ROC curves and Area Under the Curve (AUC); calibration was evaluated with calibration plots and Brier scores; classification performance was examined using confusion matrices. Decision curve analysis (DCA) and clinical impact curves (CIC) were applied to assess clinical utility across risk thresholds. Feature importance ranking and SHapley Additive exPlanations (SHAP) were used to quantify variable contributions, marginal effects, and feature interactions. In addition, regional variations in CMM incidence were illustrated using choropleth maps, and correlations between lipid markers and CMM prevalence were analyzed with Pearson coefficients and heatmaps. Over the four-year follow-up, 1,355 participants (17.1%) developed CMM. Compared with controls, incident cases were older, had a higher proportion of women and urban residents, and showed higher BMI. They also had significantly elevated triglycerides (126.6 vs. 101.8 mg/dL), reduced HDL-C (45.2 vs. 50.3 mg/dL, P < 0.001), and increased TyG-BMI and LAP (P < 0.001). Geographical analysis revealed markedly higher CMM incidence in northern cold regions (> 40%) than in southern regions (< 20%). The ensemble model achieved robust predictive performance (AUC = 0.715), followed closely by the DL model (AUC = 0.716) and GBM (AUC = 0.714). These non-linear models consistently outperformed GLM (AUC = 0.696), SVM (AUC = 0.696), and XGBoost (AUC = 0.683). Ensemble, DL, and RF models also demonstrated the best calibration (lowest Brier score, 0.125) and provided the greatest net benefit across risk thresholds. SHAP analysis indicated that composite indices, particularly TyG-BMI, LAP, and TyG, contributed most to risk prediction, whereas HDL-C exerted a protective effect. In contrast, traditional single lipid markers such as LDL-C and TC ranked lower in predictive importance. This study demonstrates that machine learning models incorporating lipid metabolism biomarkers and derived indices can predict the risk of CMM. Composite indicators such as TyG and LAP, which capture insulin resistance and visceral adiposity, showed superior predictive value. DL and ensemble models provided higher discrimination and clinical utility compared with traditional approaches. These models may enable early identification of high-risk individuals, underscoring the importance of lipid and metabolic management in CMM prevention, with potential implications for clinical decision-making and public health strategies.