The Relation between Uncertainty in Latent Class Membership and Outcomes in a Latent Class Signal Detection Model

Zhihui Cheng

doi:10.7916/d8zp4d6s

Abstract

Latent class variables are often used to predict outcomes. The conventional practice is to first assign observations to one of the latent classes based on the maximum posterior probabilities. The assigned class membership is then treated as an observed variable and used in predicting the outcomes. This widely used classify-analyze strategy ignores the uncertainty of being in a certain latent class for the observations. Once an observation is classified to the latent class with the highest posterior probability, its probability of being in the assigned class is treated as being one. In addition, once observations are classified to the latent class with the highest posterior probability, their representativeness of the class becomes the same because they will all have a probability of one of being in the assigned class. Finally, standard errors are underestimated because the residual uncertainty about the latent class membership is ignored. This dissertation used simulation studies and an analysis of a real-world data set to compare five commonly adopted approaches (most likely class regression, probability regression, probability-weighted regression, pseudo-class regression, and the simultaneous approach) for measuring the association between a latent class variable and outcome variables to see which one can better account for the uncertainty in latent class membership in such a situation. The model considered in the study was a latent class extension of the signal detection model (LC-SDT) by DeCarlo, which has proved to be able to address certain measurement issues in the educational field, more specifically, rater issues involved in essay grading such as rater effects and rater reliability. An LC-SDT model has the potential for wide applications in education as well as other areas. Therefore it is important to explore the issue of accounting for uncertainty in latent class membership within this framework. Three ordinal outcome variables having a negative, weak, and strong association with the latent class variable were considered in the simulations. Results of the simulations showed that the simultaneous approach performed best in obtaining unbiased parameter estimates. It also yielded larger standard errors than the other approaches which have been found by previous research to underestimate standard errors. Even though the simultaneous approach has its advantages, including outcome variables in a latent class model can affect parameters of the response variables. Therefore, cautions need to be taken when using this approach. The analysis results of the real-world data set confirmed the trends observed in the simulation studies.

Full Text