Learning dynamic weights for an ensemble of deep models applied to medical imaging classification

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

An ensemble of deep models is commonly used to provide more robust and accurate performance for medical image classification. A drawback of the most common ensemble aggregation operators is that they give the same importance to all models in the ensemble. As a consequence, they cannot identify weak models that may negatively influence the ensemble performance. In this work, we propose a new method based on the Dirichlet distribution and Mahalanobis distance to learn dynamic weights to an ensemble of deep learning models. Through this method, it is possible to reduce the influence of weak models for each new sample evaluated by the ensemble and perform online ensemble pruning. We evaluate this method for an ensemble of six well-known deep models applied to four medical imaging datasets. The experiments show that our method achieves the best balanced accuracy for 2 out of 4 datasets and increases the confidence of the ensemble predictions.

Similar Papers
  • PDF Download Icon
  • Research Article
  • Cite Count Icon 46
  • 10.1007/s12559-024-10257-5
Two-layer Ensemble of Deep Learning Models for Medical Image Segmentation
  • Jan 31, 2024
  • Cognitive Computation
  • Truong Dang + 4 more

One of the most important areas in medical image analysis is segmentation, in which raw image data is partitioned into structured and meaningful regions to gain further insights. By using Deep Neural Networks (DNN), AI-based automated segmentation algorithms can potentially assist physicians with more effective imaging-based diagnoses. However, since it is difficult to acquire high-quality ground truths for medical images and DNN hyperparameters require significant manual tuning, the results by DNN-based medical models might be limited. A potential solution is to combine multiple DNN models using ensemble learning. We propose a two-layer ensemble of deep learning models in which the prediction of each training image pixel made by each model in the first layer is used as the augmented data of the training image for the second layer of the ensemble. The prediction of the second layer is then combined by using a weight-based scheme which is found by solving linear regression problems. To the best of our knowledge, our paper is the first work which proposes a two-layer ensemble of deep learning models with an augmented data technique in medical image segmentation. Experiments conducted on five different medical image datasets for diverse segmentation tasks show that proposed method achieves better results in terms of several performance metrics compared to some well-known benchmark algorithms. Our proposed two-layer ensemble of deep learning models for segmentation of medical images shows effectiveness compared to several benchmark algorithms. The research can be expanded in several directions like image classification.

  • Research Article
  • Cite Count Icon 19
  • 10.1016/j.jocs.2024.102324
LitefusionNet: Boosting the performance for medical image classification with an intelligent and lightweight feature fusion network
  • May 25, 2024
  • Journal of Computational Science
  • Sohaib Asif + 3 more

LitefusionNet: Boosting the performance for medical image classification with an intelligent and lightweight feature fusion network

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 28
  • 10.3390/app12188967
Enhanced Arabic Sentiment Analysis Using a Novel Stacking Ensemble of Hybrid and Deep Learning Models
  • Sep 7, 2022
  • Applied Sciences
  • Hager Saleh + 4 more

Sentiment analysis (SA) is a machine learning application that drives people’s opinions from text using natural language processing (NLP) techniques. Implementing Arabic SA is challenging for many reasons, including equivocation, numerous dialects, lack of resources, morphological diversity, lack of contextual information, and hiding of sentiment terms in the implicit text. Deep learning models such as convolutional neural networks (CNN) and long short-term memory (LSTM) have significantly improved in the Arabic SA domain. Hybrid models based on CNN combined with long short-term memory (LSTM) or gated recurrent unit (GRU) have further improved the performance of single DL models. In addition, the ensemble of deep learning models, especially stacking ensembles, is expected to increase the robustness and accuracy of the previous DL models. In this paper, we proposed a stacking ensemble model that combined the prediction power of CNN and hybrid deep learning models to predict Arabic sentiment accurately. The stacking ensemble algorithm has two main phases. Three DL models were optimized in the first phase, including deep CNN, hybrid CNN-LSTM, and hybrid CNN-GRU. In the second phase, these three separate pre-trained models’ outputs were integrated with a support vector machine (SVM) meta-learner. To extract features for DL models, the continuous bag of words (CBOW) and the skip-gram models with 300 dimensions of the word embedding were used. Arabic health services datasets (Main-AHS and Sub-AHS) and the Arabic sentiment tweets dataset were used to train and test the models (ASTD). A number of well-known deep learning models, including DeepCNN, hybrid CNN-LSTM, hybrid CNN-GRU, and conventional ML algorithms, have been used to compare the performance of the proposed ensemble model. We discovered that the proposed deep stacking model achieved the best performance compared to the previous models. Based on the CBOW word embedding, the proposed model achieved the highest accuracy of 92.12%, 95.81%, and 81.4% for Main-AHS, Sub-AHS, and ASTD datasets, respectively.

  • Research Article
  • 10.1186/s13244-026-02220-9
Deep learning ensemble models for CT-based differentiation of malignant and benign sacral bone tumors: development and evaluation.
  • Mar 3, 2026
  • Insights into imaging
  • Ping Yin + 8 more

Radiologists often face challenges in differentiating benign from malignant sacral bone lesions due to their similar imaging characteristics. This study aimed to develop an ensemble deep learning (DL) model that can preoperatively distinguish between benign and malignant sacral tumors using noncontrast computed tomography images. Preoperative sacral CT scans from 569 patients with confirmed sacral lesions were analyzed. Data from Center 1 were utilized in model development and internal test via fivefold cross-validation, and those from Centers 2 and 3 were employed in external test. Various ensemble models combining human-readable interpretation and DL were developed. The diagnostic performance of the models and radiologists was assessed using metrics such as precision, recall, accuracy, area under the curve (AUC), F1 score, and confusion matrix. Furthermore, the clinical benefits derived from radiologists' interpretations and supported by the DL model were evaluated. The ensemble model, which integrates 3D-DenseNet121 with human interpretation, exhibited the most robust performance. The ensemble model demonstrated high performance on the internal and external test sets and achieved AUCs of 0.9139 and 0.8713, F1 scores of 0.9054 and 0.8571, precision of 0.9041 and 0.8824, recall of 0.9136 and 0.8333, and accuracy of 0.8630 and 0.8182, respectively. Across the external test cohort, all radiologists experienced improvements in AUC, accuracy, sensitivity, and specificity. Notably, junior radiologists demonstrated significant improvements compared with senior radiologists. The potential clinical application of the DL model lies in its capacity to considerably enhance the diagnostic efficiency of radiologists. This study presents the first ensemble deep learning model integrating 3D-DenseNet121 with radiologists' interpretation for preoperative differentiation of sacral tumors on noncontrast CT that improved diagnostic performance across all experience levels, particularly for junior radiologists. First artificial intelligence-radiologist ensemble for noncontrast computed tomography (NCCT)-based sacral tumor classification. Boosts all radiologists' performance, with the greatest gains for juniors, potentially reducing referrals. Enables reliable NCCT diagnosis, overcoming contrast/magnetic resonance imaging dependency in musculoskeletal oncology.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/bibm52615.2021.9669817
Enhancing Medical Image Classification via Augmentation-based Pre-training
  • Dec 9, 2021
  • Ximan Tang + 3 more

Current deep learning-based medical image classification models are typically pre-trained on large-scale natural image databases (e.g., ImageNet) with random image augmentation processing, and then fine-tuned on medical image datasets with relatively small datasets to achieve satisfactory performance. However, this method ignores the differences in data augmentation operations applied to medical images and natural images, which sometimes leads to poor classification results. In this study, a self-supervised learning framework based on augmentation is designed to boost the final classification performance with three modules. First, security evaluating module is used to filter out unsafe data augmentation operations. Then augmentation learning module follows the augmentation strategy and extracts a priori knowledge using the rest of safe operations. Finally, classification performance is verified by classification module. We conducted thorough experiments on three public medical image datasets and evaluate four parametric levels of networks. Experimental results demonstrate that our method is superior to the previous state-of-the-art, while not requiring any external search time.

  • Research Article
  • Cite Count Icon 39
  • 10.1148/radiol.230614
A CT-based Deep Learning Model for Predicting Subsequent Fracture Risk in Patients with Hip Fracture.
  • Jan 1, 2024
  • Radiology
  • Yisak Kim + 9 more

Background Patients have the highest risk of subsequent fractures in the first few years after an initial fracture, yet models to predict short-term subsequent risk have not been developed. Purpose To develop and validate a deep learning prediction model for subsequent fracture risk using digitally reconstructed radiographs from hip CT in patients with recent hip fractures. Materials and Methods This retrospective study included adult patients who underwent three-dimensional hip CT due to a fracture from January 2004 to December 2020. Two-dimensional frontal, lateral, and axial digitally reconstructed radiographs were generated and assembled to construct an ensemble model. DenseNet modules were used to calculate risk probability based on extracted image features and fracture-free probability plots were output. Model performance was assessed using the C index and area under the receiver operating characteristic curve (AUC) and compared with other models using the paired t test. Results The training and validation set included 1012 patients (mean age, 74.5 years ± 13.3 [SD]; 706 female, 113 subsequent fracture) and the test set included 468 patients (mean age, 75.9 years ± 14.0; 335 female, 22 subsequent fractures). In the test set, the ensemble model had a higher C index (0.73) for predicting subsequent fractures than that of other image-based models (C index range, 0.59-0.70 for five of six models; P value range, < .001 to < .05). The ensemble model achieved AUCs of 0.74, 0.74, and 0.73 at the 2-, 3-, and 5-year follow-ups, respectively; higher than that of most other image-based models at 2 years (AUC range, 0.57-0.71 for five of six models; P value range, < .001 to < .05) and 3 years (AUC range, 0.55-0.72 for four of six models; P value range, < .001 to < .05). Moreover, the AUCs achieved by the ensemble model were higher than that of a clinical model that included known risk factors (2-, 3-, and 5-year AUCs of 0.58, 0.64, and 0.70, respectively; P < .001 for all). Conclusion In patients with recent hip fractures, the ensemble deep learning model using digital reconstructed radiographs from hip CT showed good performance for predicting subsequent fractures in the short term. © RSNA, 2024 Supplemental material is available for this article. See also the editorial by Li and Jaremko in this issue.

  • Research Article
  • Cite Count Icon 109
  • 10.1016/j.procs.2023.01.222
Ensemble of Deep Learning Models for Brain Tumor Detection
  • Jan 1, 2023
  • Procedia Computer Science
  • Suraj Patil + 1 more

Ensemble of Deep Learning Models for Brain Tumor Detection

  • Research Article
  • Cite Count Icon 12
  • 10.1016/j.ijepes.2025.110682
Enhanced insulator fault detection using optimized ensemble of deep learning models based on weighted boxes fusion
  • Jul 1, 2025
  • International Journal of Electrical Power &amp; Energy Systems
  • Stefano Frizzo Stefenon + 3 more

Fault identification in transmission line insulators is essential to keep the power system running. Using deep learning-based models combined with interpretative techniques can be an alternative to improve power grid inspections and increase their reliability. Based on that consideration, this paper proposes an optimized ensemble of deep learning models (OEDL) based on weighted boxes fusion (WBF), called OEDL-WBF, to enhance the fault detection of power grid insulators. The proposed model is hypertuned considering a tree-structured Parzen estimator (TPE), and interpretative results are provided using the eigenvector-based class activation map (Eigen-CAM) algorithm. The Eigen-CAM had better results than Grad-CAM, Activation-CAM, MaxActivation-CAM, and WeightedActivation-CAM. The multi-criteria optimization of the structure by TPE ensures that the appropriate hyperparameters of the you only look once (YOLO) model are used for object detection. With a mean average precision (mAP)@[0.5] of 0.9841 and mAP@[0.5:0.95] of 0.9722 the proposed OEDL-WBF outperforms other deep learning-based structures, such as YOLOv8, YOLOv9, YOLOv10, YOLOv11, and YOLOv12 in a benchmarking. The Eigen-CAM further helps to interpret the outcomes of the model.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 210
  • 10.1007/s10278-017-9976-3
Medical Image Data and Datasets in the Era of Machine Learning\u2014Whitepaper from the 2016 C-MIMI Meeting Dataset Session
  • May 17, 2017
  • Journal of Digital Imaging
  • Marc D Kohli + 2 more

At the first annual Conference on Machine Intelligence in Medical Imaging (C-MIMI), held in September 2016, a conference session on medical image data and datasets for machine learning identified multiple issues. The common theme from attendees was that everyone participating in medical image evaluation with machine learning is data starved. There is an urgent need to find better ways to collect, annotate, and reuse medical imaging data. Unique domain issues with medical image datasets require further study, development, and dissemination of best practices and standards, and a coordinated effort among medical imaging domain experts, medical imaging informaticists, government and industry data scientists, and interested commercial, academic, and government entities. High-level attributes of reusable medical image datasets suitable to train, test, validate, verify, and regulate ML products should be better described. NIH and other government agencies should promote and, where applicable, enforce, access to medical image datasets. We should improve communication among medical imaging domain experts, medical imaging informaticists, academic clinical and basic science researchers, government and industry data scientists, and interested commercial entities.

  • Research Article
  • Cite Count Icon 159
  • 10.1155/2018/2061516
Medical Image Classification Based on Deep Features Extracted by Deep Model and Statistic Feature Fusion with Multilayer Perceptron‬.
  • Sep 12, 2018
  • Computational Intelligence and Neuroscience
  • Zhifei Lai + 1 more

Medical image classification is a key technique of Computer-Aided Diagnosis (CAD) systems. Traditional methods rely mainly on the shape, color, and/or texture features as well as their combinations, most of which are problem-specific and have shown to be complementary in medical images, which leads to a system that lacks the ability to make representations of high-level problem domain concepts and that has poor model generalization ability. Recent deep learning methods provide an effective way to construct an end-to-end model that can compute final classification labels with the raw pixels of medical images. However, due to the high resolution of the medical images and the small dataset size, deep learning models suffer from high computational costs and limitations in the model layers and channels. To solve these problems, in this paper, we propose a deep learning model that integrates Coding Network with Multilayer Perceptron (CNMP), which combines high-level features that are extracted from a deep convolutional neural network and some selected traditional features. The construction of the proposed model includes the following steps. First, we train a deep convolutional neural network as a coding network in a supervised manner, and the result is that it can code the raw pixels of medical images into feature vectors that represent high-level concepts for classification. Second, we extract a set of selected traditional features based on background knowledge of medical images. Finally, we design an efficient model that is based on neural networks to fuse the different feature groups obtained in the first and second step. We evaluate the proposed approach on two benchmark medical image datasets: HIS2828 and ISIC2017. We achieve an overall classification accuracy of 90.1% and 90.2%, respectively, which are higher than the current successful methods.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 27
  • 10.3390/s20215982
Enhanced Image-Based Endoscopic Pathological Site Classification Using an Ensemble of Deep Learning Models
  • Oct 22, 2020
  • Sensors
  • Dat Tien Nguyen + 5 more

In vivo diseases such as colorectal cancer and gastric cancer are increasingly occurring in humans. These are two of the most common types of cancer that cause death worldwide. Therefore, the early detection and treatment of these types of cancer are crucial for saving lives. With the advances in technology and image processing techniques, computer-aided diagnosis (CAD) systems have been developed and applied in several medical systems to assist doctors in diagnosing diseases using imaging technology. In this study, we propose a CAD method to preclassify the in vivo endoscopic images into negative (images without evidence of a disease) and positive (images that possibly include pathological sites such as a polyp or suspected regions including complex vascular information) cases. The goal of our study is to assist doctors to focus on the positive frames of endoscopic sequence rather than the negative frames. Consequently, we can help in enhancing the performance and mitigating the efforts of doctors in the diagnosis procedure. Although previous studies were conducted to solve this problem, they were mostly based on a single classification model, thus limiting the classification performance. Thus, we propose the use of multiple classification models based on ensemble learning techniques to enhance the performance of pathological site classification. Through experiments with an open database, we confirmed that the ensemble of multiple deep learning-based models with different network architectures is more efficient for enhancing the performance of pathological site classification using a CAD system as compared to the state-of-the-art methods.

  • Conference Article
  • Cite Count Icon 12
  • 10.1109/i2ct45611.2019.9033555
Towards Evaluating Performance of Domain Specific Transfer Learning for Pneumonia Detection from X-Ray Images
  • Mar 1, 2019
  • Sarang Mahajan + 4 more

Medical diagnosis and treatment is a field where Artificial Intelligence (AI) has the potential to provide tremendous scope for targeted large-scale interventions. Trained on vast troves of digitized slides showing an enormous variety of tumors, AI systems will likely provide more accurate diagnoses than human pathologists, at least on fairly rote diagnostic tasks. To harness the power of deep learning using Convolutional Neural Networks (CNN), there arises a difficulty in training a CNN from scratch because it requires large amount of computing power and also large amount of training data. The latter aspect in training a CNN turns out to be an obstacle in the case of medical image analysis. Previous works have addressed this difficulty and have demonstrated that the use of pre-trained deep CNNs with sufficient training provide a performance better than training a CNN from scratch. In this paper, we provide a comparison of the results obtained by using weights pre-trained on medical images from CheXNet dataset with the results obtained by using weights pre-trained on the ImageNet dataset. We performed experiments on the Chest X-ray dataset obtained from Kaggle. A 121-layer DenseNet when trained on top of a finely tuned CheXNet model, a test accuracy of 88.78% was obtained with precision, recall and F1-score of 0.71, 0.97 and 0.81 respectively. Our experiments consistently demonstrated that: 1) A finely tuned deep CNN model when used as a base for training dataset of medical images provided better accuracy than the CNN model trained from scratch by generating initial random weights. 2) The use of deep CNN’s pre-trained weights on a medical image dataset provides better performance for medical image classification in terms of accuracy, precision and recall, than models trained on natural images like ImageNet dataset.

  • Research Article
  • 10.17587/it.32.134-142
Deepfake detection using an optimal ensemble of deep learning models
  • Mar 13, 2026
  • Informacionnye Tehnologii
  • A P Lapsar + 1 more

The article proposes a method for combined detection of deepfakes based on the ensemble of several deep learning models that differ as much as possible in their properties. The method involves the use of ResNet, EfficientNet and MobileNe models. The integral result of the combination is formed by averaging the partial detection probabilities. The results of an experimental study are presented, demonstrating the advantages of the synthesized method when working with heterogeneous types of deepfakes.

  • Book Chapter
  • 10.1007/978-3-031-34619-4_31
Aspect Based Sentiment Analysis of COVID-19 Tweets Using Blending Ensemble of Deep Learning Models
  • Jan 1, 2023
  • Lecture notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
  • Khandaker Tayef Shahriar + 2 more

This paper takes into account the aspect-based sentiment analysis of COVID-19 tweets, in order to understand human emotions and provide decision support to policymakers. People these days use social media to share thoughts and feelings in critical situations like COVID-19. After the World Health Organization (WHO) declared COVID-19 a pandemic, a significant increase in the usage of the most influential Twitter platforms has been observed. Thus, it is impossible to manually track all the COVID-19-related tweets on the Twitter platform. Sentiment analysis is one of the solutions to this problem. In this work, we attempt to understand people’s feelings about certain aspects by analyzing the COVID-19 tweets to reduce the harmful consequences of the pandemic and to understand the crisis, humanitarian needs and measures. We, therefore, propose a framework for the aspect based sentiment analysis of COVID-19 tweets by extracting the top ten aspects and classifying positive, negative, or neutral tweets in each aspect using the blending ensemble of basic deep learning models. The experimental results show that the proposed framework achieves the highest accuracy of 85.65% compared to other benchmark deep learning models.

  • Research Article
  • Cite Count Icon 15
  • 10.1061/jpeodx.pveng-1192
An Ensemble Deep Learning Model for Short-Term Road Surface Temperature Prediction
  • Mar 1, 2023
  • Journal of Transportation Engineering, Part B: Pavements
  • Bingyou Dai + 5 more

In winter, the ice and snow on the asphalt pavement reduce the friction coefficient of the pavement, which may lead to serious traffic accidents and large-scale congestion. Taking preventive measures to ensure traffic safety by accurately predicting road surface temperature is an economical and environmentally friendly solution. However, road surface temperature (RST) prediction is a challenging task due to the complicated uncertainty and periodicity. To improve the accuracy of RST prediction, this paper aims to propose an advanced ensemble deep learning model using a gated recurrent unit (GRU) network and long short-term memory (LSTM) network. The ensemble model predicts RST by extracting the periodicity of RST and incorporating the lag and accumulation effects of meteorological factors. To verify the applicability of the ensemble model, RST data and climatic data were collected from a road weather station in Jiangsu, China. Extensive experiments are conducted including predictions for 1, 3, and 6 h ahead. The results demonstrated that the performance of the proposed ensemble deep learning model is validated for 1-, 3-, and 6-h nowcasts of RST, with mean absolute error (MAE) of 0.345, 0.833, and 1.743, respectively, and the prediction accuracy was higher than that of the baseline models [convolutional neural networks (CNN)-LSTM networks, support vector regression (SVR), and backpropagation neural network (BP) networks].

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant