Abstract

In this article we are concerned with the situation where one is estimating the outcome of a variable Y, with nominal measurement, on the basis of the outcomes of several predictor variables, X 1, X 2, ..., X r, each with nominal measurement. We assume that we have a random sample from the population. Here we are interested in estimating p, the probability of successfully predicting a new Y from the population, given the X measurements for this new observation. We begin by proposing an estimator, pa, which is the success rate in predicting Y from the current sample. We show that this estimator is always biased upwards. We then propose a second estimator, pb, which divides the original sample into two groups, a holdout group and a training group, in order to estimate p. We show that procedures such as these are always biased downwards, no matter how we divide the original sample into the two groups. Because one of these estimators tends to overestimate p while the other tends to underestimate p, we propose as a heuristic solution to use the mean of these two estimators, pc, as an estimator for p. We then perform several simulation studies to compare the three estimators with respect to both bias and MSE. These simulations seem to confirm that $ p c is a better estimator than either of the other two.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.