Abstract

Embedding cost-sensitive factors into the classifiers increases the classification stability and reduces the classification costs for classifying high-scale, redundant, and imbalanced datasets, such as the gene expression data. In this study, we extend our previous work, that is, Dissimilar ELM (D-ELM), by introducing misclassification costs into the classifier. We name the proposed algorithm as the cost-sensitive D-ELM (CS-D-ELM). Furthermore, we embed rejection cost into the CS-D-ELM to increase the classification stability of the proposed algorithm. Experimental results show that the rejection cost embedded CS-D-ELM algorithm effectively reduces the average and overall cost of the classification process, while the classification accuracy still remains competitive. The proposed method can be extended to classification problems of other redundant and imbalanced data.

Highlights

  • With the appearance of gene chips, the classification methodology for gene expression data is developed into molecule phase [1]

  • The number of features can be a hundred times larger than the number of samples [6]. This particular property of the gene expression data makes most of the traditional classifiers, such as extreme learning machine (ELM) [7], support vector machine (SVM), and multilayer neural networks, face difficulty in producing accurate and stable classification result

  • In 2012, we presented the integrated algorithm of Dissimilar ELM (D-ELM) by selective elimination of ELM based on VELM, which provided stable classification results compared to individual ELMs [8, 9]

Read more

Summary

Introduction

With the appearance of gene chips, the classification methodology for gene expression data is developed into molecule phase [1]. The classification of gene expression data represents a crucial component in generation cancer diagnosis technology [2]. For a particular tumor tissue with a series of known features, scientists believe that the classification of the gene array tells important information for identifying the tumor type and influences the treatment plan [3,4,5]. The number of features can be a hundred times larger than the number of samples [6]. This particular property of the gene expression data makes most of the traditional classifiers, such as extreme learning machine (ELM) [7], support vector machine (SVM), and multilayer neural networks, face difficulty in producing accurate and stable classification result. In 2012, we presented the integrated algorithm of Dissimilar ELM (D-ELM) by selective elimination of ELM based on VELM, which provided stable classification results compared to individual ELMs [8, 9]

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call