Abstract

Despite the proliferation of topic models, the organization of topics from the probabilistic models needs improvement in two ways: the better structured presentation of topics and the incorporation of domain knowledge on the corpus. The structured presentation, i.e., the hierarchical topic model, helps in categorizing similar topics; the incorporation of domain knowledge enables the concentrated sampling of predefined keywords in the mixture parameter learning. This paper presents a hierarchical topic models with incorporated domain knowledge, called Guided Hierarchical Topic Model (GHTM) . Specifically, we allocated the prior information from the knowledge to the Dirichlet Forest prior. From the prior adjustment, we obtained the topic tree guided by the domain knowledge. This paper also contributes in enumerating four different knowledge extraction methods and applying the extracted knowledge to GHTM. We evaluated the performance of GHTM in terms of the hierarchical clustering accuracy, and we found a significant improvement of hierarchical clustering measured by F-measures. This improvement is also verified by the perplexity analyses. Additionally, we measured topic quality with KL-divergence and visualization, and these confirm the ability to better separate topic distributions. Finally, we tested the hierarchical topic quality through human experiments, and this also revealed significant improvements originating from the guidance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.