A Phrase Topic Model for Large-scale Corpus

Baoji Li,Yuhui Tian,Juan Chen,Wenhua Xu

doi:10.1109/icccbda.2019.8725681

A Phrase Topic Model for Large-scale Corpus

Baoji Li, Yuhui Tian + Show 2 more

https://doi.org/10.1109/icccbda.2019.8725681

Copy DOI

Publication Date: Apr 1, 2019

Citations: 4

Affiliation: Ocean University of China

#Latent Dirichlet Allocation #Latent Dirichlet Allocation Model + Show 8 more

Abstract
Full-Text
Similar Papers

Abstract

The topic model is an unsupervised learning model, one of the important tools for large-scale corpus analysis, widely used in information retrieval, natural language processing, and machine learning. Traditional topic models, such as Latent Dirichlet Allocation (LDA), ignore the order of words. However, in many text-mining tasks, word order and phrases are often crucial for capturing the meaning of texts efficiently. We propose a phrase topic model based on the LDA model, which integrates a regular expression constraint condition. Our model makes the topic more meaningful and interpretable based on a limited increase in the dimensions of the vocabulary. The experimental results show that our algorithm can find meaningful phrases and have generic applicability in our test data set.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Similar Papers

Paper Title

Journal

Date

Author

View more papers

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.