Abstract

Heterogeneity occurs in many regression problems, where members from different latent subgroups respond differently to the covariates of interest (e.g., treatments) even after adjusting for other covariates. A Bayesian model called the mixture of finite mixtures (MFM) can be used to identify these subgroups, a key feature of which is that the number of subgroups is modeled as a random variable and its distribution is learned from the data. The Bayesian MFM model was not commonly used in earlier applications largely due to computational difficulties. In comparison, an alternative infinite mixture model called the Dirichlet Process Mixture (DPM) model has been a main Bayesian tool for clustering even though it is a mis-specified model for many applications. The popularity of DPM is partly due to its convenient mathematical properties that enable efficient computing algorithms.A class of Bayesian models tailored to regression problems, the conditional MFMs (cMFM), are described and studied. Computing for the cMFM is developed by extending the efficient MCMC algorithms for general MFMs. Using simulation and real data examples, the cMFM is compared to existing frequentist methods, the conditional DPM, and the original MFM and DPM models that model response and covariates jointly. The cMFM is shown to be favorable in clustering accuracy and is robust to different covariates and noise distributions.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call