Abstract

The Dirichlet and hierarchical Dirichlet processes are two important techniques for nonparametric Bayesian learning. These learning techniques allow unsupervised learning without specifying traditionally used input parameters. In topic modeling, this can be applied to discovering topics without specifying the number beforehand. Existing methods, such as those applied to topic modeling, usually take on a complex sampling calculation for inference. These techniques for inference of the Dirichlet and hierarchal Dirichlet processes are often based on Markov processes that can deviate from parametric topic modeling. This deviation may not be the best approach in the context of nonparametric topic modeling. Additionally, since they often rely on approximations they can negatively affect the predictive power of such models. In this paper we introduce a new interpretation of nonparametric Bayesian learning called the biased coin flip process—contrived for use in the context of Bayesian topic modeling. We prove mathematically the equivalence of the biased coin flip process to the Dirichlet process with an additional parameter representing the number of trials. A major benefit of the biased coin flip process is the similarity of the inference calculation to that of previous established parametric topic models—which we hope will lead to a more widespread adoption of hierarchical Dirichlet process based topic modeling. Additionally, as we show empirically the biased coin flip process leads to a nonparametric topic model with improved predictive performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.