Adaptive Swarm Balancing Algorithms for rare-event prediction in imbalanced healthcare data.

Jinyan Li,Sabah Mohammed,Lian-Sheng Liu,Simon Fong,Jinan Fiaidhi,Yunsick Sung,Kelvin K L Wong,Raymond K Wong

doi:10.1371/journal.pone.0180830

Abstract

Clinical data analysis and forecasting have made substantial contributions to disease control, prevention and detection. However, such data usually suffer from highly imbalanced samples in class distributions. In this paper, we aim to formulate effective methods to rebalance binary imbalanced dataset, where the positive samples take up only the minority. We investigate two different meta-heuristic algorithms, particle swarm optimization and bat algorithm, and apply them to empower the effects of synthetic minority over-sampling technique (SMOTE) for pre-processing the datasets. One approach is to process the full dataset as a whole. The other is to split up the dataset and adaptively process it one segment at a time. The experimental results reported in this paper reveal that the performance improvements obtained by the former methods are not scalable to larger data scales. The latter methods, which we call Adaptive Swarm Balancing Algorithms, lead to significant efficiency and effectiveness improvements on large datasets while the first method is invalid. We also find it more consistent with the practice of the typical large imbalanced medical datasets. We further use the meta-heuristic algorithms to optimize two key parameters of SMOTE. The proposed methods lead to more credible performances of the classifier, and shortening the run time compared to brute-force method.

Highlights

Big Data in medical fields, such as hospital informatization construction, the progress of treatments, and the extensive use of high-throughput equipment, have caused a geometric growth of attentions
Our methods clearly show their effectiveness in the processing of the imbalanced dataset classification problem with different dataset sizes
Meta-heuristic algorithms can blindly select the parameters of synthetic minority oversampling technique (SMOTE) to obtain a relatively high accuracy with a Kappa value that falls within the credible range

Summary

Introduction

Big Data in medical fields, such as hospital informatization construction, the progress of treatments, and the extensive use of high-throughput equipment, have caused a geometric growth of attentions. The sources of health data include clinical medical treatments, pharmaceutical companies, medical research, medical assistance application, and more. It is well known that compared with normal and healthy persons, patients comprise only a small part of the total population. Those more serious diseases, such as cancer and AIDS, have fewer numbers of cases. That constitutes the imbalanced dataset when we try to train classifiers on such data, which causes over-fitting the majority classes and biases our results For instance, in the binary classification of a cancer dataset, the amount of the negative samples (healthy) is dominant, and the obtained model is likely to have little discriminative ability on the positive samples (patient). It is an unacceptable mistake to identify cancer patients as healthy people

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLOS ONE	Publication Date: Jul 28, 2017
Citations: 46	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Adaptive Swarm Balancing Algorithms for rare-event prediction in imbalanced healthcare data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE

Lead the way for us

Similar Papers

OPTIMIZING NEURAL NETWORK CLASSIFIER FOR DIABETES DATA USING METAHEURISTIC ALGORITHMS
...
-
, et. al. ...
21 Feb 2018
21 Feb 2018

Enhancing Accuracy in Stock Price Prediction: The Power of Optimization Algorithms
Vivi Aida Fitria ... Lilis Widayanti
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer | VOL. 23
Vivi Aida Fitria, et. al.Vivi Aida Fitria ... Lilis Widayanti
25 Mar 2024
MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer | VOL. 23

Automated semiconductor wafer defect classification dealing with imbalanced data
Po-Hsuan Lee ... Wei Fang
-
Po-Hsuan Lee, et. al.Po-Hsuan Lee ... Wei Fang
20 Mar 2020
20 Mar 2020

Error optimization using Bat and PSO algorithms for machine vision system based tool movement
Anu Garg ... Prasant Kumar Mahapatra
-
Anu Garg, et. al.Anu Garg ... Prasant Kumar Mahapatra
01 Sep 2014
01 Sep 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Adaptive Swarm Balancing Algorithms for rare-event prediction in imbalanced healthcare data.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS ONE