Abstract

With the development of online social media, the topic extraction of short text has become an important research field. How to extract the topic, especially new topics that have not been recognized, from increasing and updated short texts has attracted the attention of scholars. This paper focuses on constructing a system based on long short-term memory (LSTM) model in deep learning. Firstly, the short text is converted to a word vector matrix by the word2vec model. After that, two models based on LSTM were designed. One is used to recognize whether the text belongs to an existing topic or a new one. The other identifies whether two text samples belong to the same topic or not. Finally, a hierarchical clustering model is used to find the number of new topics based on the output information of the two LSTM models. The experimental results show that the system constructed in this paper can identify new text topics well and achieve good algorithm performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call