Abstract

Convex bootstrap error estimation is a popular tool for classifier error estimation in gene expression studies. A basic question is how to determine the weight for the convex combination between the basic bootstrap estimator and the resubstitution estimator such that the resulting estimator is unbiased at finite sample sizes. The well-known 0.632 bootstrap error estimator uses asymptotic arguments to propose a fixed 0.632 weight, whereas the more recent 0.632+ bootstrap error estimator attempts to set the weight adaptively. In this paper, we study the finite sample problem in the case of linear discriminant analysis under Gaussian populations. We derive exact expressions for the weight that guarantee unbiasedness of the convex bootstrap error estimator in the univariate and multivariate cases, without making asymptotic simplifications. Using exact computation in the univariate case and an accurate approximation in the multivariate case, we obtain the required weight and show that it can deviate significantly from the constant 0.632 weight, depending on the sample size and Bayes error for the problem. The methodology is illustrated by application on data from a well-known cancer classification study.Electronic supplementary materialThe online version of this article (doi:10.1186/s13637-014-0015-0) contains supplementary material, which is available to authorized users.

Highlights

  • The bootstrap method [1,2,3,4,5,6,7] has been used in a wide range of statistical problems

  • Gene expression classification example Here we demonstrate the application of the previous theory in comparing the performance of the bootstrap error estimator using the optimal weight versus the use of the fixed w = 0.632 weight, using gene expression data from the well-known breast cancer classification study in [42], which analyzed expression profiles from 295 tumor specimens, divided into N0 = 115 specimens belonging to the ‘good-prognosis’ population and N1 = 180 specimens belonging to the ‘poor-prognosis’ population

  • Despite the approximate nature of the results, given that the simulated training samples are not independent from each other, we can see that the bias and RMS were always smaller for the estimator using the optimal weight than using the fixed 0.632 weight

Read more

Summary

Introduction

The bootstrap method [1,2,3,4,5,6,7] has been used in a wide range of statistical problems. The expected error rate of the bootstrap LDA classification rule defined by (23) is given by: E εnC,0 | N0 = n0, C = (e, f ; ρc) + (−e, −f ; ρc) , (25)

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call