Dirichlet Process Mixture Models for Modeling and Generating Synthetic Versions of Nested Categorical Data

Jingchen Hu,Jerome P Reiter,Quanli Wang

doi:10.1214/16-ba1047

Abstract

We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that (i) each group is a member of a group-level latent class, and (ii) each unit is a member of a unit-level latent class nested within its group-level latent class. This structure allows the model to capture dependence among units in the same group. It also facilitates simultaneous modeling of variables at both group and unit levels. We develop a version of the model that assigns zero probability to groups and units with physically impossible combinations of variables. We apply the model to estimate multivariate relationships in a subset of the American Community Survey. Using the estimated model, we generate synthetic household data that could be disseminated as redacted public use files. Supplementary materials (Hu et al., 2017) for this article are available online.

Highlights

The data comprise units nested within groups, and include categorical variables measured at the unit level and at the group level
The model assumes that (i) each group is a member of a group-level latent class, and (ii) each unit is a member of a unit-level latent class nested within its group-level latent class
As part of generating the synthetic data, we evaluate disclosure risks using the measures suggested in Hu et al (2014)

Summary

Introduction

The data comprise units nested within groups (e.g., people within households), and include categorical variables measured at the unit level (e.g., individuals’ demographic characteristics) and at the group level (e.g., whether the family owns or rents their home). A typical analysis goal is to estimate multivariate relationships among the categorical variables, accounting for the hierarchical structure in the data. To estimate joint distributions with multivariate categorical data, many analysts rely on mixtures of products of multinomial distributions, known as. Of particular note, Dunson and Xing (2009) present a nonparametric Bayesian version of the latent class model, using a Dirichlet process mixture (DPM) for the prior distribution. The DPM prior distribution is appealing, in that (i) it has full support on the space of joint distributions for unordered categorical variables, ensuring that the model does not restrict dependence structures a priori, and (ii) it fully incorporates uncertainty about the effective number of latent classes in posterior inferences

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Bayesian Analysis	Publication Date: Mar 1, 2018
Citations: 41	License type: cc-by

R Discovery Prime

R Discovery Prime

Dirichlet Process Mixture Models for Modeling and Generating Synthetic Versions of Nested Categorical Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bayesian Analysis

Lead the way for us

Similar Papers

Some Thoughts About Data Type, Distribution, and Statistical Significance
D Scot Malay
The Journal of Foot and Ankle Surgery | VOL. 45
D Scot MalayD Scot Malay
01 Nov 2006
The Journal of Foot and Ankle Surgery | VOL. 45

Typologies of Violence Among Youth Who Encounter Child Welfare Systems
Susan M Snyder ... Rachel E Smith
Journal of Family Social Work | VOL. 17
Susan M Snyder, et. al.Susan M Snyder ... Rachel E Smith
20 Oct 2014
Journal of Family Social Work | VOL. 17

圖示量化屬性資料之對應－集群分析的應用：以學生性格特質、主修科系與職業期待的關聯性研究為例

-

01 Aug 2005
01 Aug 2005

Categorical Data Analysis
Alan Agresti
-
Alan AgrestiAlan Agresti
03 Jul 2002
03 Jul 2002

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dirichlet Process Mixture Models for Modeling and Generating Synthetic Versions of Nested Categorical Data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Bayesian Analysis