Imbalanced biomedical data classification using self-adaptive multilayer ELM combined with dynamic GAN

Liyuan Zhang,Zhengang Jiang,Huamin Yang

doi:10.1186/s12938-018-0604-3

Liyuan Zhang, Zhengang Jiang + Show 1 more

Open Access

https://doi.org/10.1186/s12938-018-0604-3

Copy DOI

Abstract

BackgroundImbalanced data classification is an inevitable problem in medical intelligent diagnosis. Most of real-world biomedical datasets are usually along with limited samples and high-dimensional feature. This seriously affects the classification performance of the model and causes erroneous guidance for the diagnosis of diseases. Exploring an effective classification method for imbalanced and limited biomedical dataset is a challenging task.MethodsIn this paper, we propose a novel multilayer extreme learning machine (ELM) classification model combined with dynamic generative adversarial net (GAN) to tackle limited and imbalanced biomedical data. Firstly, principal component analysis is utilized to remove irrelevant and redundant features. Meanwhile, more meaningful pathological features are extracted. After that, dynamic GAN is designed to generate the realistic-looking minority class samples, thereby balancing the class distribution and avoiding overfitting effectively. Finally, a self-adaptive multilayer ELM is proposed to classify the balanced dataset. The analytic expression for the numbers of hidden layer and node is determined by quantitatively establishing the relationship between the change of imbalance ratio and the hyper-parameters of the model. Reducing interactive parameters adjustment makes the classification model more robust.ResultsTo evaluate the classification performance of the proposed method, numerical experiments are conducted on four real-world biomedical datasets. The proposed method can generate authentic minority class samples and self-adaptively select the optimal parameters of learning model. By comparing with W-ELM, SMOTE-ELM, and H-ELM methods, the quantitative experimental results demonstrate that our method can achieve better classification performance and higher computational efficiency in terms of ROC, AUC, G-mean, and F-measure metrics.ConclusionsOur study provides an effective solution for imbalanced biomedical data classification under the condition of limited samples and high-dimensional feature. The proposed method could offer a theoretical basis for computer-aided diagnosis. It has the potential to be applied in biomedical clinical practice.

Highlights

Imbalanced data classification is an inevitable problem in medical intelligent diagnosis
While imbalanced class distribution frequently occurs in real-world biomedical datasets, which causes the loss of essential pathological information from abnormal class [2]
Discussions In this study, we have developed a self-adaptive multilayer extreme learning machine (ELM) model combining with dynamic generative adversarial net (GAN) to classify the limited and imbalanced dataset for the biomedical engineering application

Summary

Introduction

Imbalanced data classification is an inevitable problem in medical intelligent diagnosis. Most of real-world biomedical datasets are usually along with limited samples and high-dimensional feature. This seriously affects the classification performance of the model and causes erroneous guidance for the diagnosis of diseases. While imbalanced class distribution frequently occurs in real-world biomedical datasets, which causes the loss of essential pathological information from abnormal class [2]. The training set sometimes contains high-dimensional feature and small samples. These factors further result in a lower classification accuracy of abnormal class and incorrect diagnosis result [4]. Establishing an effective classification model is an urgently necessary task for limited and imbalanced biomedical dataset

Methods

Results

Conclusion