Unbiased bootstrap error estimation for linear discriminant analysis.

Thang Vu,Ulisses M Braga-Neto,Chao Sima,Edward R Dougherty

doi:10.1186/s13637-014-0015-0

Thang Vu, Ulisses M Braga-Neto + Show 2 more

Open Access

https://doi.org/10.1186/s13637-014-0015-0

Copy DOI

Abstract

Convex bootstrap error estimation is a popular tool for classifier error estimation in gene expression studies. A basic question is how to determine the weight for the convex combination between the basic bootstrap estimator and the resubstitution estimator such that the resulting estimator is unbiased at finite sample sizes. The well-known 0.632 bootstrap error estimator uses asymptotic arguments to propose a fixed 0.632 weight, whereas the more recent 0.632+ bootstrap error estimator attempts to set the weight adaptively. In this paper, we study the finite sample problem in the case of linear discriminant analysis under Gaussian populations. We derive exact expressions for the weight that guarantee unbiasedness of the convex bootstrap error estimator in the univariate and multivariate cases, without making asymptotic simplifications. Using exact computation in the univariate case and an accurate approximation in the multivariate case, we obtain the required weight and show that it can deviate significantly from the constant 0.632 weight, depending on the sample size and Bayes error for the problem. The methodology is illustrated by application on data from a well-known cancer classification study.Electronic supplementary materialThe online version of this article (doi:10.1186/s13637-014-0015-0) contains supplementary material, which is available to authorized users.

Highlights

The bootstrap method [1,2,3,4,5,6,7] has been used in a wide range of statistical problems
Gene expression classification example Here we demonstrate the application of the previous theory in comparing the performance of the bootstrap error estimator using the optimal weight versus the use of the fixed w = 0.632 weight, using gene expression data from the well-known breast cancer classification study in [42], which analyzed expression profiles from 295 tumor specimens, divided into N0 = 115 specimens belonging to the ‘good-prognosis’ population and N1 = 180 specimens belonging to the ‘poor-prognosis’ population
Despite the approximate nature of the results, given that the simulated training samples are not independent from each other, we can see that the bias and RMS were always smaller for the estimator using the optimal weight than using the fixed 0.632 weight

Summary

Introduction

The bootstrap method [1,2,3,4,5,6,7] has been used in a wide range of statistical problems. The expected error rate of the bootstrap LDA classification rule defined by (23) is given by: E εnC,0 | N0 = n0, C = (e, f ; ρc) + (−e, −f ; ρc) , (25)

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Bioinformatics and Systems Biology	Publication Date: Oct 3, 2014
Citations: 35	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Unbiased bootstrap error estimation for linear discriminant analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Bioinformatics and Systems Biology

Lead the way for us

Similar Papers

Bolstered error estimation
Ulisses Braga-Neto ... Edward Dougherty
Pattern Recognition | VOL. 37
Ulisses Braga-Neto, et. al.Ulisses Braga-Neto ... Edward Dougherty
10 Feb 2004
Pattern Recognition | VOL. 37

Estimation of multivariate generalized gamma convolutions through Laguerre expansions.
Oskar Laverny ... Véronique Maume-Deschamps
Electronic Journal of Statistics | VOL. 15
Oskar Laverny, et. al.Oskar Laverny ... Véronique Maume-Deschamps
01 Jan 2020
Electronic Journal of Statistics | VOL. 15

On the sampling distribution of resubstitution and leave-one-out error estimators for linear classifiers
Amin Zollanvari ... Edward R Dougherty
Pattern Recognition | VOL. 42
Amin Zollanvari, et. al.Amin Zollanvari ... Edward R Dougherty
14 May 2009
Pattern Recognition | VOL. 42

Bayesian classification of ripening stages of tomato fruit using acoustic impact and colorimeter sensor data
Arturo Baltazar ... Gustavo González-Aguilar
Computers and Electronics in Agriculture | VOL. 60
Arturo Baltazar, et. al.Arturo Baltazar ... Gustavo González-Aguilar
07 Sep 2007
Computers and Electronics in Agriculture | VOL. 60

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unbiased bootstrap error estimation for linear discriminant analysis.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Bioinformatics and Systems Biology