A Comparison of Nonparametric Error Rate Estimation Methods in Classification Problems

Sonja Wehberg,Martin Schumacher

doi:10.1002/bimj.200410011

Abstract

AbstractAssessment of the misclassification error rate is of high practical relevance in many biomedical applications. As it is a complex problem, theoretical results on estimator performance are few. The origin of most findings are Monte Carlo simulations, which take place in the “normal setting”: The covariables of two groups have a multivariate normal distribution; The groups differ in location, but have the same covariance matrix and the linear discriminant function LDF is used for prediction.We perform a new simulation to compare existing nonparametric estimators in a more complex situation. The underlying distribution is based on a logistic model with six binary as well as continuous covariables. To study estimator performance for varying true error rates, three prediction rules including nonparametric classification trees and parametric logistic regression and sample sizes ranging from 100‐1,000 are considered. In contrast to most published papers we turn our attention to estimator performance based on simple, even inappropriate prediction rules and relatively large training sets.For the major part, results are in agreement with usual findings. The most strikingly behavior was seen in applying (simple) classification trees for prediction: Since the apparent error rate Êrr.app is biased, linear combinations incorporating Êrr.app underestimate the true error rate even for large sample sizes. The .632+ estimator, which was designed to correct for the overoptimism of Efron's .632 estimator for nonparametric prediction rules, performs best of all such linear combinations. The bootstrap estimator Êrr.B0 and the crossvalidation estimator Êrr.cv, which do not depend on Êrr.app, seem to track the true error rate. Although the disadvantages of both estimators – pessimism of Êrr.B0 and high variability of Êrr.cv – shrink with increased sample sizes, they are still visible.We conclude that for the choice of a particular estimator the asymptotic behavior of the apparent error rate is important. For the assessment of estimator performance the variance of the true error rate is crucial, where in general the stability of prediction procedures is essential for the application of estimators based on resampling methods. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Comparison of Nonparametric Error Rate Estimation Methods in Classification Problems

Abstract

Talk to us

Similar Papers

More From: Biometrical Journal

Lead the way for us

Journal: Biometrical Journal	Publication Date: Feb 1, 2004
Citations: 23

Similar Papers

True Pedigree Errors More Frequent Than Apparent Errors for Single Nucleotide Polymorphisms
Derek Gordon ... Simonc Heath
Human Heredity | VOL. 49
Derek Gordon, et. al.Derek Gordon ... Simonc Heath
01 Jan 1998
Human Heredity | VOL. 49

Generalized Consistent Error Estimator of Linear Discriminant Analysis
Amin Zollanvari ... Edward R Dougherty
IEEE Transactions on Signal Processing | VOL. 63
Amin Zollanvari, et. al.Amin Zollanvari ... Edward R Dougherty
01 Jun 2015
IEEE Transactions on Signal Processing | VOL. 63

Performance and estimation of the true error rate of classification rules built with additional information. An application to a cancer trial
David Conde ... Cristina Rueda
Statistical Applications in Genetics and Molecular Biology | VOL. 12
David Conde, et. al.David Conde ... Cristina Rueda
01 Jan 2013
Statistical Applications in Genetics and Molecular Biology | VOL. 12

Power loss for multiallelic transmission/disequilibrium test when errors introduced: GAW11 simulated data.
Derek Gordon ... Simon C Heath
Genetic epidemiology | VOL. Suppl 17 1
Derek Gordon, et. al.Derek Gordon ... Simon C Heath
01 Jan 1998
Genetic epidemiology | VOL. Suppl 17 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Comparison of Nonparametric Error Rate Estimation Methods in Classification Problems

Abstract

Talk to us

Similar Papers

More From: Biometrical Journal