Abstract

Unlike random sampling, selective sampling draws units based on the outcome values, such as over-sampling rare events in choice outcomes and extreme activities on continuous and count outcomes. Despite high cost effectiveness for marketing research, such endogenously selected samples must be carefully analyzed to avoid selection bias. We introduce a unified and efficient approach based on semiparametric odds ratio (SOR) models applicable for categorical, continuous and count response data collected using selective sampling. Unlike extant sampling-adjusting methods and Heckman-type selection models, the proposed approach requires neither modeling selection mechanisms nor imposing parametric distributional assumptions on the response variables, eliminating both sources of mis-specification bias. Using this approach, one can quantify and test for the relationships among variables as if samples had been collected via random sampling, simplifying bias correction of endogenously selected samples. We evaluate and illustrate the method using extensive simulation studies and two real data examples: endogenously stratified sampling for linear/nonlinear regressions to identify drivers of the share-of-wallet outcome for cigarettes smokers, and using truncated and on-site samples for count data models of store shopping demand. The evaluation shows that selective sampling followed by applying the SOR approach reduces required sample size by more than 70% compared with random sampling, and that in a wide range of selective sampling scenarios SOR offers novel solutions outperforming extant methods for selective samples with opportunities to make better managerial decisions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call