Abstract

Abstract The topic recognition for dynamic topic number can realize the dynamic update of super parameters, and obtain the probability distribution of dynamic topics in time dimension, which helps to clear the understanding and tracking of convection text data. However, the current topic recognition model tends to be based on a fixed number of topics K and lacks multi-granularity analysis of subject knowledge. Therefore, it is impossible to deeply perceive the dynamic change of the topic in the time series. By introducing a novel approach on the basis of Infinite Latent Dirichlet allocation model, a topic feature lattice under the dynamic topic number is constructed. In the model, documents, topics and vocabularies are jointly modeled to generate two probability distribution matrices: Documents-topics and topic-feature words. Afterwards, the association intensity is computed between the topic and its feature vocabulary to establish the topic formal context matrix. Finally, the topic feature is induced according to the formal concept analysis (FCA) theory. The topic feature lattice under dynamic topic number (TFL_DTN) model is validated on the real dataset by comparing with the mainstream methods. Experiments show that this model is more in line with actual needs, and achieves better results in semi-automatic modeling of topic visualization analysis.

Highlights

  • With the widespread application of Web 2.0, self-media platforms, such as online forums and online communities, have gradually become the main form of information exchange

  • Current topic recognition models are mostly based on a fixed number of topics, which cannot represent the semantic relevance between topics

  • The real data is summarized by the method of manual annotation, and the number of topics varies in the interval [15, 60], which is consistent with the experimental results of the Self-adaptive topic analysis model (STAM) model

Read more

Summary

Introduction

With the widespread application of Web 2.0, self-media platforms, such as online forums and online communities, have gradually become the main form of information exchange. The topic modeling of the review dataset can realize the “short description” of the document, providing the possibility for mining the hidden semantic structure of largescale datasets. In the process of topic recognition and evolution, the dynamic change of the number of topics makes it difficult to quantitatively analyze the relationship between. Received January 22, 2020, accepted July 24, 2020 Supported by the Key Projects of Social Sciences of Anhui Provincial Department of Education (SK2018A1064, SK2018A1072), the Natural Scientific Project of Anhui Provincial Department of Education (KJ2019A0371), and Innovation Team of Health Information Management and Application Research (BYKC201913), BBMC the content relevance of a document and the number of topics[1]. The recognition results depend only on the probability between topics, which makes it difficult to characterize the inherent hierarchical relationship of comment events. It is extremely urgent to dig deeper topic relationship on the review topic

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.