Performance of Error Estimators for Classification

Edward Dougherty,Blaise Hanczar,> Hua Hua,Ulisses Braga-Neto,Chao Sima

doi:10.2174/157489310790596385

Abstract

Classification in bioinformatics often suffers from small samples in conjunction with large numbers of features, which makes error estimation problematic. When a sample is small, there is insufficient data to split the sample and the same data are used for both classifier design and error estimation. Error estimation can suffer from high variance, bias, or both. The problem of choosing a suitable error estimator is exacerbated by the fact that estimation performance depends on the rule used to design the classifier, the feature-label distribution to which the classifier is to be applied, and the sample size. This paper reviews the performance of training-sample error estimators with respect to several criteria: estimation accuracy, variance, bias, correlation with the true error, regression on the true error, and accuracy in ranking feature sets. A number of error estimators are considered: resubstitution, leave-one-out cross-validation, 10-fold cross-validation, bolstered resubstitution, semi-bolstered resubstitution, .632 bootstrap, .632+ bootstrap, and optimal bootstrap. It illustrates these performance criteria for certain models and for two real data sets, referring to the literature for more extensive applications of these criteria. The results given in the present paper are consistent with those in the literature and lead to two conclusions: (1) much greater effort needs to be focused on error estimation, and (2) owing to the generally poor performance of error estimators on small samples, for a conclusion based on a small-sample error estimator to be considered valid, it should be supported by evidence that the estimator in question can be expected to perform sufficiently well under the circumstances to justify the conclusion. Keywords: Classification, epistemology, error estimation, validity

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance of Error Estimators for Classification

Abstract

Talk to us

Similar Papers

More From: Current Bioinformatics

Lead the way for us

Journal: Current Bioinformatics	Publication Date: Mar 1, 2010
Citations: 45

Similar Papers

Evaluation of Error Rate Estimators in Discriminant Analysis with Multivariate Binary Variables
Egbo Ikechukwu
American Journal of Theoretical and Applied Statistics | VOL. 5
Egbo IkechukwuEgbo Ikechukwu
01 Jan 2015
American Journal of Theoretical and Applied Statistics | VOL. 5

The effect of sample size and bias on the reliability of estimates of error: a comparative study of Dahlberg's formula
S D Springate
The European Journal of Orthodontics | VOL. 34
S D SpringateS D Springate
29 Mar 2011
The European Journal of Orthodontics | VOL. 34

Characterization of the Effectiveness of Reporting Lists of Small Feature Sets Relative to the Accuracy of the Prior Biological Knowledge
Chen Zhao ... Robert S Chapkin
Cancer Informatics | VOL. 9
Chen Zhao, et. al.Chen Zhao ... Robert S Chapkin
01 Jan 2009
Cancer Informatics | VOL. 9

Analysis of Robustness of Variational Multiscale Error Estimators for the Forward Propagation Study
Pankaj Negi
Turkish Journal of Computer and Mathematics Education (TURCOMAT) | VOL. 9
Pankaj NegiPankaj Negi
30 Dec 2019
Turkish Journal of Computer and Mathematics Education (TURCOMAT) | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance of Error Estimators for Classification

Abstract

Talk to us

Similar Papers

More From: Current Bioinformatics