Abstract

Documents from the same domain usually discuss similar topics in a similar order. In this paper we present new ordering-based topic models that use generalised Mallows models to capture this regularity to constrain topic assignments. Specifically, these new models assume that there is a canonical topic ordering shared amongst documents from the same domain, and each document-specific topic ordering is allowed to vary from the canonical topic ordering. Instead of full orderings over a set of all possible topics covered by a domain, we make use of top-t orderings via a multistage ranking process. We show how to reformulate the new models so that a point-wise sampling algorithm from the Bayesian word segmentation literature can be used for posterior inference. Experimental results on several document collections with different properties show that our model performs much better than the other topic ordering-based models, and competitively with other state-of-the-art topic segmentation models.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.