Deep learning models for image classification of lymphoma: a pilot study incanine
The aim of this study was to distinguish canine lymphoma from other diseases,particularly reactive lymphoid hyperplasia (RLH), based on fine needle aspiration (FNA)images. We developed four deep learning models based on Vision Transformer (ViT) andInception-v3, which were pre-trained image classification models. The two models out offour were ViT and Inception-v3, and the remained were the two types of combination, i.e.,ensemble learning models, of ViT and Inception-v3; the mean of class probabilities of ViTand Inception-v3 (Ensemble model A; MEAN) and the maximum probabilities of ViT andInception-v3 (Ensemble model B; MAX). A total of 2,290 FNA images of canine lymphoma and871 FNA images of RLH were analyzed. The FNA images were obtained from the twenty-fiveslides of fourteen lymphoma cases and eight slides of seven RLH cases in two hospitals.Three types of training and test datasets were prepared from the above image datasets forfair evaluation of the models. Three deep learning-based image classification models(Inception-v3 and the two ensemble models) attained high performance of >80% accuracy,recall and area under the curve (AUC) values for all three datasets. ViT did not archivehigh performance, except the precision (>0.85). This study is an example of showingpotentials of deep learning models through image classification problem in caninelymphoma.
- Research Article
- 10.1186/s13244-026-02220-9
- Mar 3, 2026
- Insights into imaging
Radiologists often face challenges in differentiating benign from malignant sacral bone lesions due to their similar imaging characteristics. This study aimed to develop an ensemble deep learning (DL) model that can preoperatively distinguish between benign and malignant sacral tumors using noncontrast computed tomography images. Preoperative sacral CT scans from 569 patients with confirmed sacral lesions were analyzed. Data from Center 1 were utilized in model development and internal test via fivefold cross-validation, and those from Centers 2 and 3 were employed in external test. Various ensemble models combining human-readable interpretation and DL were developed. The diagnostic performance of the models and radiologists was assessed using metrics such as precision, recall, accuracy, area under the curve (AUC), F1 score, and confusion matrix. Furthermore, the clinical benefits derived from radiologists' interpretations and supported by the DL model were evaluated. The ensemble model, which integrates 3D-DenseNet121 with human interpretation, exhibited the most robust performance. The ensemble model demonstrated high performance on the internal and external test sets and achieved AUCs of 0.9139 and 0.8713, F1 scores of 0.9054 and 0.8571, precision of 0.9041 and 0.8824, recall of 0.9136 and 0.8333, and accuracy of 0.8630 and 0.8182, respectively. Across the external test cohort, all radiologists experienced improvements in AUC, accuracy, sensitivity, and specificity. Notably, junior radiologists demonstrated significant improvements compared with senior radiologists. The potential clinical application of the DL model lies in its capacity to considerably enhance the diagnostic efficiency of radiologists. This study presents the first ensemble deep learning model integrating 3D-DenseNet121 with radiologists' interpretation for preoperative differentiation of sacral tumors on noncontrast CT that improved diagnostic performance across all experience levels, particularly for junior radiologists. First artificial intelligence-radiologist ensemble for noncontrast computed tomography (NCCT)-based sacral tumor classification. Boosts all radiologists' performance, with the greatest gains for juniors, potentially reducing referrals. Enables reliable NCCT diagnosis, overcoming contrast/magnetic resonance imaging dependency in musculoskeletal oncology.
- Research Article
66
- 10.1016/j.bspc.2023.105130
- Jun 15, 2023
- Biomedical Signal Processing and Control
Classification of EEG signals using Transformer based deep learning and ensemble models
- Research Article
- 10.1186/s12944-025-02820-2
- Dec 20, 2025
- Lipids in Health and Disease
Cardiometabolic multimorbidity (CMM) has become an increasing global public health challenge. In China, the prevalence of CMM is rising rapidly among middle-aged and older adults, with estimates ranging from 11.6% to 16.9%, posing a substantial burden on both individuals and healthcare systems. However, effective tools for predicting individual risk of CMM remain limited, hindering timely prevention and intervention. This study used data from the China Health and Retirement Longitudinal Study (CHARLS) between 2011 and 2015, including 7,913 participants aged ≥ 45 years without CMM at baseline. Incident CMM events were identified during the 2015 follow-up based on self-reported diagnoses of cardiometabolic diseases. Ten lipid metabolism biomarkers and derived composite indices (TC, TG, LDL-C, HDL-C, TyG, TyG-BMI, LAP, CTI, non-HDL-C, and RC) were evaluated. Predictive models were estimated using logistic regression, random forest, gradient boosting machine, eXtreme Gradient Boosting (XGBoost), support vector machine, naïve Bayes, deep learning (DL), and an ensemble model. The dataset was randomly split into training (75%) and validation (25%) subsets. Model discrimination was assessed using ROC curves and Area Under the Curve (AUC); calibration was evaluated with calibration plots and Brier scores; classification performance was examined using confusion matrices. Decision curve analysis (DCA) and clinical impact curves (CIC) were applied to assess clinical utility across risk thresholds. Feature importance ranking and SHapley Additive exPlanations (SHAP) were used to quantify variable contributions, marginal effects, and feature interactions. In addition, regional variations in CMM incidence were illustrated using choropleth maps, and correlations between lipid markers and CMM prevalence were analyzed with Pearson coefficients and heatmaps. Over the four-year follow-up, 1,355 participants (17.1%) developed CMM. Compared with controls, incident cases were older, had a higher proportion of women and urban residents, and showed higher BMI. They also had significantly elevated triglycerides (126.6 vs. 101.8 mg/dL), reduced HDL-C (45.2 vs. 50.3 mg/dL, P < 0.001), and increased TyG-BMI and LAP (P < 0.001). Geographical analysis revealed markedly higher CMM incidence in northern cold regions (> 40%) than in southern regions (< 20%). The ensemble model achieved robust predictive performance (AUC = 0.715), followed closely by the DL model (AUC = 0.716) and GBM (AUC = 0.714). These non-linear models consistently outperformed GLM (AUC = 0.696), SVM (AUC = 0.696), and XGBoost (AUC = 0.683). Ensemble, DL, and RF models also demonstrated the best calibration (lowest Brier score, 0.125) and provided the greatest net benefit across risk thresholds. SHAP analysis indicated that composite indices, particularly TyG-BMI, LAP, and TyG, contributed most to risk prediction, whereas HDL-C exerted a protective effect. In contrast, traditional single lipid markers such as LDL-C and TC ranked lower in predictive importance. This study demonstrates that machine learning models incorporating lipid metabolism biomarkers and derived indices can predict the risk of CMM. Composite indicators such as TyG and LAP, which capture insulin resistance and visceral adiposity, showed superior predictive value. DL and ensemble models provided higher discrimination and clinical utility compared with traditional approaches. These models may enable early identification of high-risk individuals, underscoring the importance of lipid and metabolic management in CMM prevention, with potential implications for clinical decision-making and public health strategies.
- Research Article
- 10.1007/s43832-026-00375-6
- Feb 27, 2026
- Discover Water
The study demonstrates a multi-model ensemble method for river discharge forecasting over flood-prone major rivers of India (Brahmaputra, Ganga, and Kosi). Deep learning-based models (RNN, GRU, LSTM, BiLSTM) are applied over these diverse river systems of India. These models are trained using rainfall, soil moisture, and model-simulated river discharge with 3-day, 5-day, and 7-day moving training windows (T-3, T-5, and T-7) to generate 1 to 7 day (F-1 $$\ldots $$ F-7) discharge forecasts. To enhance forecast performance, an ensemble model approach is proposed in this study. A global Ridge model is used, which takes deep learning model outputs with statistical features from the data to generate multi-day river discharge forecasts. Performance evaluation is carried out for individual deep learning models and the ensemble global Ridge model. Individual models performed well for Day-1 forecasts across all sliding temporal training windows with statistical measures such as Nash–Sutcliffe efficiency performing well (NSE $$\approx $$ 0.9–0.97) for all river systems. However, performance degrades with increase in lead time in all deep learning models. The ensemble model improves overall performance compared to individual models across all training windows, especially for longer lead times. The results for the Day-1 ensemble model forecast reflect higher performance over the Brahmaputra (NSE = 0.974), Ganga (NSE = 0.966), and Kosi (NSE = 0.978) rivers. Substantial improvement is observed for Day-7 with NSE values of 0.711 (Brahmaputra), 0.845 (Ganga), and 0.830 (Kosi) for the T-7 training window. These results highlight that combining different deep learning models with varying architectures by a global Ridge ensemble model yields robust short-to-medium range discharge forecasts in data-sparse river basins, providing a promising, computationally efficient tool for the development of an operational flood early-warning system.
- Research Article
- 10.64615/fjes...2025.72
- Nov 10, 2025
- Fusion Journal of Engineering and Sciences
Flood is a type of natural disaster that leads to a widespread devastation. The increasing amount of rain specifically in the Urban regions of Sindh province causes several issues, whereas the drainage system is not very efficient to handle the large amount of water in a short period of time. Identification of floods is essential for disaster response, as it helps locate areas which need immediate help. Recently, the deep learning-based models have shown the best performance for image classification tasks. In this paper, a deep learning-based ensemble model has been developed where four state-of-the-art deep learning models are combined to classify flood from the images either captured with the mobile camera or other image capturing devices. The deep learning ensemble model has been trained and tested on the two publicly available datasets labelled with flood and non-flood images. To enhance the efficacy of the deep learning-based ensemble model, the hyper-parameters of the four models are fine-tuned. The results obtained show that the deep learning-based ensemble model outperforms than the individual models.
- Research Article
19
- 10.1371/journal.pone.0282608
- Mar 9, 2023
- PLOS ONE
COVID-19 is highly infectious and causes acute respiratory disease. Machine learning (ML) and deep learning (DL) models are vital in detecting disease from computerized chest tomography (CT) scans. The DL models outperformed the ML models. For COVID-19 detection from CT scan images, DL models are used as end-to-end models. Thus, the performance of the model is evaluated for the quality of the extracted feature and classification accuracy. There are four contributions included in this work. First, this research is motivated by studying the quality of the extracted feature from the DL by feeding these extracted to an ML model. In other words, we proposed comparing the end-to-end DL model performance against the approach of using DL for feature extraction and ML for the classification of COVID-19 CT scan images. Second, we proposed studying the effect of fusing extracted features from image descriptors, e.g., Scale-Invariant Feature Transform (SIFT), with extracted features from DL models. Third, we proposed a new Convolutional Neural Network (CNN) to be trained from scratch and then compared to the deep transfer learning on the same classification problem. Finally, we studied the performance gap between classic ML models against ensemble learning models. The proposed framework is evaluated using a CT dataset, where the obtained results are evaluated using five different metrics The obtained results revealed that using the proposed CNN model is better than using the well-known DL model for the purpose of feature extraction. Moreover, using a DL model for feature extraction and an ML model for the classification task achieved better results in comparison to using an end-to-end DL model for detecting COVID-19 CT scan images. Of note, the accuracy rate of the former method improved by using ensemble learning models instead of the classic ML models. The proposed method achieved the best accuracy rate of 99.39%.
- Research Article
16
- 10.3389/frwa.2023.1305998
- Dec 13, 2023
- Frontiers in Water
Groundwater resource management in arid regions has a critical importance for sustaining human activities and ecological systems. Accurate mapping of groundwater potential plays a vital role in effective water resource planning. This study investigates the effectiveness of machine learning models, including Random Forest (RF), Adaboost, K-Nearest Neighbors (KNN), and Gaussian Process in groundwater potential mapping (GWPM) in the Tan-Tan arid region, Morocco. Fourteen groundwater conditional factors were considered following multicollinearity test, including topographical, hydrological, climatic, and geological factors. Additionally, point data with 174 sites indicative of groundwater occurrences were incorporated. The groundwater inventory data underwent random partitioning into training and testing datasets at three different ratios: 55/45%, 65/35%, and 75/25%. Ultimately, a comprehensive ranking of the 13 models, encompassing both individual and ensemble models, was determined using the prioritization rank technique. The results revealed that ensemble learning (EL) models, particularly RF and Adaboost (RF-Adaboost), outperformed individual models in groundwater potential mapping. Based on accuracy assessment using the validation dataset, the RF-Adaboost EL results yielded an Area Under the Receiver Operating characteristic Curve (AUROC) and Overall Accuracy (OA) of 94.02 and 94%, respectively. Ensemble models have been effectively applied to integrate 14 factors, capturing their intricate interrelationships, and thereby enhancing the accuracy and robustness of groundwater prediction in the Tan-Tan water-scarce region. Among the natural factors, the current study identified lithology, structural elements (such as faults and tectonic lineaments), and land use as significant contributors to groundwater potential. However, the critical characteristics of the study area showing a coastal position as well as a low background in groundwater prospectivity (low borehole points) are challenging in GWPM. The findings highlight the importance of the significant factors in assessing and managing groundwater resources in arid regions. Moreover, this study makes a contribution to the management of groundwater resources by demonstrating the effectiveness of ensemble learning algorithms in the groundwater potential mapping (GWPM) in arid regions.
- Research Article
- 10.3390/medicina61111945
- Oct 30, 2025
- Medicina
Background and Objectives: We aimed to apply the ensemble machine learning model to diagnose thyroid cartilage invasion detected in computer tomography (CT) images in laryngeal cancers and evaluate the diagnostic performance of the model. Materials and Methods: A total of 313 patients were divided into two groups: the cartilage invasion group and the no cartilage invasion group. At least four CT slices were randomly selected for each patient, resulting in a total of 1251 images used in the study. A total of 619 axial CT images from the no cartilage invasion group and 632 axial CT images from the cartilage invasion group were used in the study. We reviewed the CT images and histopathological diagnoses in all cases to determine the invasion positive- or negative-status as a ground truth. The ensemble model, comprising ResNet50 and MobileNet deep learning architectures, was applied to CT images. Results: The following were obtained by the ensemble model with the test dataset: area under the curve (AUC) 0.99, and accuracy 96.54%. This model demonstrates a very high level of performance in detecting thyroid cartilage invasion. Conclusions: The ensemble machine learning model is an effective method for detecting neoplastic infiltration of the thyroid cartilage. Moreover, it may be a valuable diagnostic tool for clinicians in assessing disease prognosis and determining appropriate treatment strategies in laryngeal cancers. In conclusion, this model could be integrated into future clinical practice in laryngology and head and neck surgery for the detection of cartilage neoplastic infiltration.
- Research Article
- 10.1016/j.compbiomed.2025.111078
- Oct 1, 2025
- Computers in biology and medicine
Predicting dementia through audio: Ensemble and deep learning approaches using acoustic features.
- Research Article
1937
- 10.1016/j.engappai.2022.105151
- Jul 30, 2022
- Engineering Applications of Artificial Intelligence
Ensemble deep learning: A review
- Research Article
35
- 10.1002/sam.11480
- Aug 19, 2020
- Statistical Analysis and Data Mining: The ASA Data Science Journal
We analyzed a data set containing functional brain images from 6 healthy controls and 196 individuals with Parkinson's disease (PD), who were divided into five stages according to illness severity. The goal was to predict patients' PD illness stages by using their functional brain images. We employed the following prediction approaches: multivariate statistical methods (linear discriminant analysis, support vector machine, decision tree, and multilayer perceptron [MLP]), ensemble learning models (random forest [RF] and adaptive boosting), and deep convolutional neural network (CNN). For statistical and ensemble models, various feature extraction approaches (principal component analysis [PCA], multilinear PCA, intensity summary statistics [IStat], and Laws' texture energy measure) were employed to extract features, the synthetic minority over‐sampling technique was used to address imbalanced data, and the optimal combination of hyperparameters was found using a grid search. For CNN modeling, we applied an image augmentation technique to increase and balance data sizes over different disease stages. We adopted transfer learning to incorporate pretrained VGG16 weights and architecture into the model fitting, and we also tested a state‐of‐the‐art machine learning model that could automatically generate an optimal neural architecture. We found that IStat consistently outperformed other feature extraction approaches. MLP and RF were the analytic approaches with the highest prediction accuracy rate for multivariate statistical and ensemble learning models, respectively. Overall, the deep CNN model with pretrained VGG16 weights and architecture outperformed other approaches; it captured critical features from imaging, effectively distinguished between normal controls and patients with PD, and achieved the highest classification accuracy.
- Research Article
- 10.1155/acis/5211419
- Jan 1, 2025
- Applied Computational Intelligence and Soft Computing
This study presents ensemble machine learning (ML) models for predicting residential energy consumption in South Africa. By combining the best features of individual ML models, ensemble models reduce the drawbacks of each model and improve prediction accuracy. We present four ensemble models: ensemble by averaging (EA), ensemble by stacking each estimator (ESE), ensemble by boosting (EB), and ensemble by voting estimator (EVE). These models are built on top of Random Forest (RF) and Decision Tree (DT). These base predictor models leverage historical energy consumption patterns to capture temporal intricacies, including seasonal variations and rolling averages. In addition, we employed feature engineering methodologies to further enhance their predictive abilities. The accuracy of each ensemble model was evaluated by assessing various performance indicators, including the mean squared error (MSE), mean absolute error (MAE), mean absolute percentage error (MAPE), and coefficient of determination R2. Overall, the findings illustrate the efficiency of ensemble learning models in providing accurate predictions for residential energy consumption. This study provides valuable insights for researchers and practitioners in predicting energy consumption in residential buildings and the benefits of using ensemble learning models in the building and energy research domains.
- Research Article
5
- 10.63125/edxgjg56
- Jun 1, 2022
- Review of Applied Science and Technology
This meta-analytic study investigates the effectiveness of machine learning (ML), neural networks (NN), and ensemble learning models in forecasting future investment value across diverse financial markets. Using PRISMA 2020 guidelines, 108 peer-reviewed articles published between 2012 and 2022 were systematically selected from databases including Scopus, Web of Science, and IEEE Xplore. The study synthesizes empirical findings on model performance, feature engineering, and algorithmic robustness to evaluate predictive accuracy, generalizability, and practical applicability. Results indicate that neural networks—particularly deep learning architectures such as LSTM and CNN—demonstrate superior performance in capturing nonlinear patterns and temporal dependencies in financial time series data. Ensemble models such as Random Forest, XGBoost, and hybrid frameworks (e.g., stacking, bagging, boosting) consistently outperform standalone ML models in terms of accuracy, stability, and resistance to overfitting. Approximately 34% of reviewed studies integrated macroeconomic indicators, technical indicators, and sentiment analysis to enhance feature richness, while 28% adopted multi-asset forecasting involving equities, cryptocurrencies, and derivatives. Performance metrics such as RMSE, MAPE, and R² revealed that ensemble and deep learning models achieve up to 20–30% improvement in predictive reliability compared to traditional statistical models like ARIMA and linear regression. The review also highlights a growing emphasis on model interpretability, with techniques like SHAP and LIME being applied in 18% of studies to support explainability in high-stakes investment decisions. However, challenges remain in model transparency, computational complexity, and adaptability across volatile market conditions. Compared to earlier literature, this study reflects a paradigm shift from linear forecasting models to adaptive, data-driven approaches supported by AI technologies. The findings underscore the transformative potential of ML, NNs, and ensemble models in investment forecasting while calling for continued research into scalable, explainable, and risk-aware deployment strategies for real-world financial environments.
- Research Article
1
- 10.23977/autml.2019.11001
- Dec 5, 2019
- Automation and Machine Learning
Aiming at the defect that the ensemble learning model such as Light Gradient Boosting Machine only mines the data information once, which can not automatically refine the granularity of data mining and dig into the more potential internal correlation information of data, the ensemble learning model is made into a deep form by sliding window and deepening, and the deep ensemble learning is proposed. Sliding window enables the ensemble learning model to automatically refine the granularity of data mining, so as to dig deeper into the potential internal correlation information in the data, and at the same time endue it with certain representation learning ability. Based on the sliding window, the deepening step further improves the representation learning ability of the model. Finally, the results show that the prediction accuracy of the deep ensemble learning model is 6.16 percentage points higher than that of the original ensemble learning model.
- Research Article
14
- 10.1007/s13369-023-08672-1
- Jan 27, 2024
- Arabian Journal for Science and Engineering
In the last 50 years, with the growth of cities and increase in the number of vehicles and mobility, traffic has become troublesome. As a result, traffic flow prediction started to attract attention as an important research area. However, despite the extensive literature, traffic flow prediction still remains as an open research problem, specifically for long-term traffic flow prediction. Compared to the models developed for short-term traffic flow prediction, the number of models developed for long-term traffic flow prediction is very few. Based on this shortcoming, in this study, we focus on long-term traffic flow prediction and propose a novel deep ensemble model (DEM). In order to build this ensemble model, first, we developed a convolutional neural network (CNN), a long short-term memory (LSTM) network and a gated recurrent unit (GRU) network as deep learning models, which formed the base learners. In the next step, we combine the output of these models according to their individual forecasting success. We use another deep learning model to determine the success of the individual models. Our proposed model is a flexible ensemble prediction model that can be updated based on traffic data. To evaluate the performance of the proposed model, we use a publicly available dataset. Experimental results show that the developed DEM model has a mean square error of 0.06 and a mean absolute error of 0.15 for single-step prediction; it shows that achieves a mean square error of 0.25 and a mean absolute error of 0.32 for multi-step prediction. We compared our proposed model with many models in different categories; individual deep learning models (i.e., LSTM, CNN, GRU), selected traditional machine learning models (i.e., linear regression, decision tree regression, k-nearest-neighbors regression) and other ensemble models such as random-forest regression. These results also support the claim that ensemble learning models perform better than individual models.