Abstract

We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that (i) each group is a member of a group-level latent class, and (ii) each unit is a member of a unit-level latent class nested within its group-level latent class. This structure allows the model to capture dependence among units in the same group. It also facilitates simultaneous modeling of variables at both group and unit levels. We develop a version of the model that assigns zero probability to groups and units with physically impossible combinations of variables. We apply the model to estimate multivariate relationships in a subset of the American Community Survey. Using the estimated model, we generate synthetic household data that could be disseminated as redacted public use files. Supplementary materials (Hu et al., 2017) for this article are available online.

Highlights

  • The data comprise units nested within groups, and include categorical variables measured at the unit level and at the group level

  • The model assumes that (i) each group is a member of a group-level latent class, and (ii) each unit is a member of a unit-level latent class nested within its group-level latent class

  • As part of generating the synthetic data, we evaluate disclosure risks using the measures suggested in Hu et al (2014)

Read more

Summary

Introduction

The data comprise units nested within groups (e.g., people within households), and include categorical variables measured at the unit level (e.g., individuals’ demographic characteristics) and at the group level (e.g., whether the family owns or rents their home). A typical analysis goal is to estimate multivariate relationships among the categorical variables, accounting for the hierarchical structure in the data. To estimate joint distributions with multivariate categorical data, many analysts rely on mixtures of products of multinomial distributions, known as. Of particular note, Dunson and Xing (2009) present a nonparametric Bayesian version of the latent class model, using a Dirichlet process mixture (DPM) for the prior distribution. The DPM prior distribution is appealing, in that (i) it has full support on the space of joint distributions for unordered categorical variables, ensuring that the model does not restrict dependence structures a priori, and (ii) it fully incorporates uncertainty about the effective number of latent classes in posterior inferences

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.