Abstract

AbstractAssessment of the misclassification error rate is of high practical relevance in many biomedical applications. As it is a complex problem, theoretical results on estimator performance are few. The origin of most findings are Monte Carlo simulations, which take place in the “normal setting”: The covariables of two groups have a multivariate normal distribution; The groups differ in location, but have the same covariance matrix and the linear discriminant function LDF is used for prediction.We perform a new simulation to compare existing nonparametric estimators in a more complex situation. The underlying distribution is based on a logistic model with six binary as well as continuous covariables. To study estimator performance for varying true error rates, three prediction rules including nonparametric classification trees and parametric logistic regression and sample sizes ranging from 100‐1,000 are considered. In contrast to most published papers we turn our attention to estimator performance based on simple, even inappropriate prediction rules and relatively large training sets.For the major part, results are in agreement with usual findings. The most strikingly behavior was seen in applying (simple) classification trees for prediction: Since the apparent error rate Êrr.app is biased, linear combinations incorporating Êrr.app underestimate the true error rate even for large sample sizes. The .632+ estimator, which was designed to correct for the overoptimism of Efron's .632 estimator for nonparametric prediction rules, performs best of all such linear combinations. The bootstrap estimator Êrr.B0 and the crossvalidation estimator Êrr.cv, which do not depend on Êrr.app, seem to track the true error rate. Although the disadvantages of both estimators – pessimism of Êrr.B0 and high variability of Êrr.cv – shrink with increased sample sizes, they are still visible.We conclude that for the choice of a particular estimator the asymptotic behavior of the apparent error rate is important. For the assessment of estimator performance the variance of the true error rate is crucial, where in general the stability of prediction procedures is essential for the application of estimators based on resampling methods. (© 2004 WILEY‐VCH Verlag GmbH & Co. KGaA, Weinheim)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.