Abstract

A common approach to analyze a covariate-sample count matrix, an element of which represents how many times a covariate appears in a sample, is to factorize it under the Poisson likelihood. We show its limitation in capturing the tendency for a covariate present in a sample to both repeat itself and excite related ones. To address this limitation, we construct negative binomial factor analysis (NBFA) to factorize the matrix under the negative binomial likelihood, and relate it to a Dirichlet-multinomial distribution based mixed-membership model. To support countably infinite factors, we propose the hierarchical gamma-negative binomial process. By exploiting newly proved connections between discrete distributions, we construct two blocked and a collapsed Gibbs sampler that all adaptively truncate their number of factors, and demonstrate that the blocked Gibbs sampler developed under a compound Poisson representation converges fast and has low computational complexity. Example results show that NBFA has a distinct mechanism in adjusting its number of inferred factors according to the sample lengths, and provides clear advantages in parsimonious representation, predictive power, and computational complexity over previously proposed discrete latent variable models, which either completely ignore burstiness, or model only the burstiness of the covariates but not that of the factors.

Highlights

  • The need to analyze a covariate-sample count matrix, each of whose elements counts the number of time that a covariate appears in a sample, arises in many different settings, such as text analysis, next-generation sequencing, medical records mining, and consumer behavior studies

  • As Poisson factor analysis (PFA) is closely related to the canonical mixed-membership model built on the multinomial distribution, we show that negative binomial factor analysis (NBFA) is closely related to a Dirichlet-multinomial mixedmembership (DMMM) model that uses the Dirichlet-categorical (Dirichlet-multinomial) rather than categorical distributions to assign an index to both a covariate and a factor

  • To support countably infinite factors for NBFA, generalizing the gamma-negative binomial process (GNBP) (Zhou and Carin, 2015; Zhou et al, 2016b), we introduce a new nonparametric Bayesian prior: the hierarchical gamma-negative binomial process, where each of the J samples is assigned with a sample-specific GNBP and a globally shared gamma process is mixed with all the J GNBPs

Read more

Summary

Introduction

The need to analyze a covariate-sample count matrix, each of whose elements counts the number of time that a covariate appears in a sample, arises in many different settings, such as text analysis, next-generation sequencing, medical records mining, and consumer behavior studies. Without capturing the self- and cross-excitation (stimulation) of covariate frequencies or better modeling the overdispersed covariate-sample counts, the ultimate potential of the mixed-membership model and PFA will be limited no matter how the priors on latent parameters are adjusted. It could be a waste of computation if the model tries to increase the model capacity to better capture overdispersions that could be explained with self- and cross-excitations To remove these restrictions, we introduce negative binomial factor analysis (NBFA) to factorize the covariate-sample count matrix, in which we replace the Poisson distributions on which PFA is built, with the negative binomial (NB) distributions. The proofs and Gibbs sampling update equations are provided in the Supplementary Material (Zhou, 2017)

Poisson factor analysis
Multinomial mixed-membership model
Negative binomial factor analysis
The Dirichlet-multinomial mixed-membership model
Comparisons with related models
Hierarchical model
Blocked Gibbs sampling
Collapsed Gibbs sampling
Blocked Gibbs sampling under compound Poisson representation
Model comparison
Example results
Prediction of heldout covariate indices
Unsupervised feature learning for classification
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.