Abstract

When a simple random sample of size n is employed to establish a classification rule for prediction of a polytomous variable by an independent variable, the best achievable rate of misclassification is higher than the corresponding best achievable rate if the conditional probability distribution is known for the predicted variable given the independent variable. In typical cases, this increased misclassification rate due to sampling is remarkably small relative to other increases in expected measures of prediction accuracy due to samplings that are typically encountered in statistical analysis.This issue is particularly striking if a polytomous variable predicts a polytomous variable, for the excess misclassification rate due to estimation approaches 0 at an exponential rate as n increases. Even with a continuous real predictor and with simple nonparametric methods, it is typically not difficult to achieve an excess misclassification rate on the order of n -1. Although reduced excess error is normally desirable, it may reasonably be argued that, in the case of classification, the reduction in bias is related to a more fundamental lack of sensitivity of misclassification error to the quality of the prediction. This lack of sensitivity is not an issue if criteria based on probability prediction such as logarithmic penalty or least squares are employed, but the latter measures typically involve more substantial issues of bias. With polytomous predictors, excess expected errors due to sampling are typically of order n -1. For a continuous real predictor, the increase in expected error is typically of order n -2/3.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call