An Integrated Approach for Identifying Wrongly Labelled Samples When Performing Classification in Microarray Data

Yuk Yee Leung,Yeung Sam Hung,Chun Qi Chang

doi:10.1371/journal.pone.0046700

Yuk Yee Leung, Yeung Sam Hung + Show 1 more

Open Access

https://doi.org/10.1371/journal.pone.0046700

Copy DOI

Abstract

BackgroundUsing hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement. This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own.ResultsWe tackle the outlier detection problem based on a previously proposed Multiple-Filter-Multiple-Wrapper (MFMW) model, which was demonstrated to yield promising results when compared to other hybrid approaches (Leung and Hung, 2010). To incorporate outlier detection and overcome limitations of the existing MFMW model, three new features are introduced in our proposed MFMW-outlier approach: 1) an unbiased external Leave-One-Out Cross-Validation framework is developed to replace internal cross-validation in the previous MFMW model; 2) wrongly labeled samples are identified within the MFMW-outlier model; and 3) a stable set of genes is selected using an L1-norm SVM that removes any redundant genes present. Six binary-class microarray datasets were tested. Comparing with outlier detection studies on the same datasets, MFMW-outlier could detect all the outliers found in the original paper (for which the data was provided for analysis), and the genes selected after outlier removal were proven to have biological relevance. We also compared MFMW-outlier with PRAPIV (Zhang et al., 2006) based on same synthetic datasets. MFMW-outlier gave better average precision and recall values on three different settings. Lastly, artificially flipped microarray datasets were created by removing our detected outliers and flipping some of the remaining samples' labels. Almost all the ‘wrong’ (artificially flipped) samples were detected, suggesting that MFMW-outlier was sufficiently powerful to detect outliers in high-dimensional microarray datasets.

Highlights

Classification is one of the major goals in microarray data analysis [1,2,3,4,5,6,7]
As reported in different studies using an unbiased validation model, perfect leave-one-out crossvalidation (LOOCV) accuracies cannot be achieved in many microarray datasets [10,11] even though many gene selection tools have been combined with classifiers of different natures in various experiments
For five out of the six microarray datasets we worked on, different number of outliers were removed in each iteration

Summary

Introduction

Classification is one of the major goals in microarray data analysis [1,2,3,4,5,6,7]. As reported in different studies using an unbiased validation model, perfect leave-one-out crossvalidation (LOOCV) accuracies cannot be achieved in many microarray datasets [10,11] even though many gene selection tools have been combined with classifiers of different natures in various experiments. This suggests something wrong about these datasets which may be caused by the presence of wrongly labelled samples. For some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS ONE	Publication Date: Oct 17, 2012
Citations: 55	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Integrated Approach for Identifying Wrongly Labelled Samples When Performing Classification in Microarray Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Classification of CT Scan Images of Lungs Using Deep Convolutional Neural Network with External Shape-Based Features.
Varun Srivastava ... Ravindra Kr Purwar
Journal of Digital Imaging | VOL. 33
Varun Srivastava, et. al.Varun Srivastava ... Ravindra Kr Purwar
26 Jun 2019
Journal of Digital Imaging | VOL. 33

Automatic Identification of Similar Pull-Requests in GitHub’s Repositories Using Machine Learning
Hamzeh Eyal Salman ... Zakarea Alshara
Information | VOL. 13
Hamzeh Eyal Salman, et. al.Hamzeh Eyal Salman ... Zakarea Alshara
03 Feb 2022
Information | VOL. 13

Outlier detection in graph streams
Charu C Aggarwal ... Yuchen Zhao
-
Charu C Aggarwal, et. al.Charu C Aggarwal ... Yuchen Zhao
01 Apr 2011
01 Apr 2011

Recognition and segmentation of teeth and mandibular nerve canals in panoramic dental X-rays by Mask RCNN
Xiaoting Zhao ... Sheng Liang
Displays | VOL. 78
Xiaoting Zhao, et. al.Xiaoting Zhao ... Sheng Liang
22 Apr 2023
Displays | VOL. 78

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Integrated Approach for Identifying Wrongly Labelled Samples When Performing Classification in Microarray Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE