Abstract

Topic modeling for short texts is a challenging and interesting problem in the machine learning and knowledge discovery domains. Nowadays, millions of documents published on the internet from various sources. Internet websites are full of various topics and information, but there is a lot of simila rity between topics, contents, and total quality of sources, which causes data repetition and gives the user the same information. Another issue is data sparsity and ambiguity because the length of the short text is limited, which causes unsatisfactory results and give irrelevant results to end-users. All these mentioned issues in short texts made an interesting topic for researchers to use machine learning and knowledge discovery techniques to discover underlying topics from a massive amount of data. In this paper, we propose a combination of deep reinforcement learning (RL) and semantics-assisted non-negative matrix factorization model to extract meaningful and underlying topics from short document contents. The main objective of this work is to reduce the problem of repetitive information and data sparsity in short texts to help the users to get meaningful and relevant contents. Furthermore, our propose model reviews an issue of the Seq2Seq approach based on the reinforcement learning perspective and provides a combination of reinforcement learning and SeaNMF formulation using the block coordinate descent algorithm. Moreover, we compare different real-world datasets by using numerical calculation and present a couple of state-of-art models to get better performance on short text document topic modeling. Based on experimental results and comparative analysis, our propose model outperforms the state of art techniques in terms of short document topic modeling.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call