Estimating Success in Predicting a Variable with Nominal Measurement Using Other Variables with Nominal Measurements

Ron Suich

doi:10.1207/s15327906325-346

Abstract

In this article we are concerned with the situation where one is estimating the outcome of a variable Y, with nominal measurement, on the basis of the outcomes of several predictor variables, X 1, X 2, ..., X r, each with nominal measurement. We assume that we have a random sample from the population. Here we are interested in estimating p, the probability of successfully predicting a new Y from the population, given the X measurements for this new observation. We begin by proposing an estimator, pa, which is the success rate in predicting Y from the current sample. We show that this estimator is always biased upwards. We then propose a second estimator, pb, which divides the original sample into two groups, a holdout group and a training group, in order to estimate p. We show that procedures such as these are always biased downwards, no matter how we divide the original sample into the two groups. Because one of these estimators tends to overestimate p while the other tends to underestimate p, we propose as a heuristic solution to use the mean of these two estimators, pc, as an estimator for p. We then perform several simulation studies to compare the three estimators with respect to both bias and MSE. These simulations seem to confirm that $ p c is a better estimator than either of the other two.

Full Text