Optimality Driven Nearest Centroid Classification from Genomic Data

Alan R Dabney,John D Storey

doi:10.1371/journal.pone.0001002

Alan R Dabney, John D Storey

Open Access

https://doi.org/10.1371/journal.pone.0001002

Copy DOI

Journal: PLoS ONE	Publication Date: Oct 3, 2007
Citations: 49	License type: CC BY 4.0

Affiliation: Texas A&M University, University of Washington

Abstract

Nearest-centroid classifiers have recently been successfully employed in high-dimensional applications, such as in genomics. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is frequently carried out by computing univariate scores for each feature individually, without consideration for how a subset of features performs as a whole. We introduce a new feature selection approach for high-dimensional nearest centroid classifiers that instead is based on the theoretically optimal choice of a given number of features, which we determine directly here. This allows us to develop a new greedy algorithm to estimate this optimal nearest-centroid classifier with a given number of features. In addition, whereas the centroids are usually formed from maximum likelihood estimates, we investigate the applicability of high-dimensional shrinkage estimates of centroids. We apply the proposed method to clinical classification based on gene-expression microarrays, demonstrating that the proposed method can outperform existing nearest centroid classifiers.

Highlights

Linear Discriminant Analysis (LDA) is a long-standing prediction method that has been well characterized when the number of features used for prediction is small [1]
We evaluate different methods for estimating the optimal subset of a given size with the following sets of simulations
We have considered the estimation of this optimal subset

Summary

Introduction

Linear Discriminant Analysis (LDA) is a long-standing prediction method that has been well characterized when the number of features used for prediction is small [1]. The observation is assigned to the class to which it is nearest, allowing LDA to be interpreted as a ‘‘nearest centroid classifier.’’. It can be argued that a classifier built with a smaller number of features is preferable to an accurate classifier built with the complete set of features. This problem is analagous to, but in general distinct from, that of selecting variables in a regression model by, say, least angle regression (LARS) [4]. Work on the feature selection problem in discriminant analysis has been summarized elsewhere [5,6]

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimality Driven Nearest Centroid Classification from Genomic Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE

Lead the way for us

Similar Papers

Michigan Particle Swarm Optimization for Prototype Reduction in Classification Problems
Alejandro Cervantes ... Pedro Isasi
New Generation Computing | VOL. 27
Alejandro Cervantes, et. al.Alejandro Cervantes ... Pedro Isasi
01 May 2009
New Generation Computing | VOL. 27

Feature selection and nearest centroid classification for protein mass spectrometry
Ilya Levner
BMC Bioinformatics | VOL. 6
Ilya LevnerIlya Levner
01 Jan 2004
BMC Bioinformatics | VOL. 6

Sparse ℓ1 and ℓ2 Center Classifiers
Giuseppe C Calafiore ... Giulia Fracastoro
IFAC-PapersOnLine | VOL. 53
Giuseppe C Calafiore, et. al.Giuseppe C Calafiore ... Giulia Fracastoro
01 Jan 2020
IFAC-PapersOnLine | VOL. 53

NCC-AUC: an AUC optimization method to identify multi-biomarker panel for cancer prognosis from genomic and clinical data
Meng Zou ... Yong Wang
Bioinformatics | VOL. 31
Meng Zou, et. al.Meng Zou ... Yong Wang
18 Jun 2015
Bioinformatics | VOL. 31

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimality Driven Nearest Centroid Classification from Genomic Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS ONE