Abstract

BackgroundMicroarray technology allows biologists to monitor expression levels of thousands of genes among various tumor tissues. Identifying relevant genes for sample classification of various tumor types is beneficial to clinical studies. One of the most widely used classification strategies for multiclass classification data is the One-Versus-All (OVA) schema that divides the original problem into multiple binary classification of one class against the rest. Nevertheless, multiclass microarray data tend to suffer from imbalanced class distribution between majority and minority classes, which inevitably deteriorates the performance of the OVA classification.ResultsIn this study, we propose a novel iterative ensemble feature selection (IEFS) framework for multiclass classification of imbalanced microarray data. In particular, filter feature selection and balanced sampling are performed iteratively and alternatively to boost the performance of each binary classification in the OVA schema. The proposed framework is tested and compared with other representative state-of-the-art filter feature selection methods using six benchmark multiclass microarray data sets. The experimental results show that IEFS framework provides superior or comparable performance to the other methods in terms of both classification accuracy and area under receiver operating characteristic curve. The more number of classes the data have, the better performance of IEFS framework achieves.ConclusionsBalanced sampling and feature selection together work well in improving the performance of multiclass classification of imbalanced microarray data. The IEFS framework is readily applicable to other biological data analysis tasks facing the same problem.

Highlights

  • Microarray technology allows biologists to monitor expression levels of thousands of genes among various tumor tissues

  • We propose an iterative ensemble feature selection (IEFS) framework based on the OneVersus-All (OVA) classification schema [13] to improve the classification performance in terms of both classification accuracy and area under receiver operating characteristic curve (AUC)

  • Microarray data sets To validate the effectiveness of IEFS framework, six multiclass benchmark microarray data sets shown in Table 1 are used in the experiments

Read more

Summary

Introduction

Microarray technology allows biologists to monitor expression levels of thousands of genes among various tumor tissues. Identifying relevant genes for sample classification of various tumor types is beneficial to clinical studies. Microarray gene expression data are widely used for cancer clinical studies [1, 2].The identification of relevant genes to cancers is a common biological challenge [3]. Li et al [9] compared different feature selection and multiclass classification methods for gene expression data. The paper indicated that multiclass classification problem is much more difficult than the binary one for gene expression data. By comparing several filter feature selection methods and representative classifiers including naive Bayes, k-nearest neighbor (KNN), and support vector machine (SVM), they suggested that

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call