Abstract

Topic recognition technology has been commonly applied to identify different categories of news topics from the vast amount of web information, which has a wide application prospect in the field of online public opinion monitoring, news recommendation, and so on. However, it is very challenging to effectively utilize key feature information such as syntax and semantics in the text to improve topic recognition accuracy. Some researchers proposed to combine the topic model with the word embedding model, whose results had shown that this approach could enrich text representation and benefit natural language processing downstream tasks. However, for the topic recognition problem of news texts, there is currently no standard way of combining topic model and word embedding model. Besides, some existing similar approaches were more complex and did not consider the fusion between topic distribution of different granularity and word embedding information. Therefore, this paper proposes a novel text representation method based on word embedding enhancement and further forms a full-process topic recognition framework for news text. In contrast to traditional topic recognition methods, this framework is designed to use the probabilistic topic model LDA, the word embedding models Word2vec and Glove to fully extract and integrate the topic distribution, semantic knowledge, and syntactic relationship of the text, and then use popular classifiers to automatically recognize the topic categories of news based on the obtained text representation vectors. As a result, the proposed framework can take advantage of the relationship between document and topic and the context information, which improves the expressive ability and reduces the dimensionality. Based on the two benchmark datasets of 20NewsGroup and BBC News, the experimental results verify the effectiveness and superiority of the proposed method based on word embedding enhancement for the news topic recognition problem.

Highlights

  • With the rapid development of information technology, people have been accustomed to obtaining various information from the Internet. ese platforms, such as news websites and social media, enable us to know what is happening around the world whenever and wherever possible

  • Based on the theoretical analysis of different text representation methods, this paper proposes that the effective combination of topic model and word embedding model can enrich the text representation and benefit natural language processing (NLP) downstream tasks

  • To solve the above problems, this paper further proposes a novel text representation method based on word embedding enhancement, and applies it to the feature extraction layer of the proposed news topic recognition framework

Read more

Summary

Introduction

With the rapid development of information technology, people have been accustomed to obtaining various information from the Internet. ese platforms, such as news websites and social media, enable us to know what is happening around the world whenever and wherever possible. With the rapid development of information technology, people have been accustomed to obtaining various information from the Internet. Topic recognition technology has become a research hotspot in the field of natural language processing (NLP). As a matter of fact, the online news text is an important knowledge carrier, which has the characteristics of complex structures, wide sources, and huge volumes. In view of this situation, it is very challenging to effectively utilize key feature information such as syntax and semantics in the text to improve topic recognition accuracy

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call