Abstract
With the widespread use of online social networks, billions of pieces of information are generated every day. How to detect new topics quickly and accurately at such data scale plays a vital role in information recommendation and public opinion control. One of the basic research tasks of topic detection is how to represent a topic. The existing topic representation models do not focus on how to select better differentiated words to represent topics, are still computer-centered, and do not effectively combine human intelligence and artificial intelligence (AI). To solve these problems, this article proposes a word-distributed sensitive topic representation model (WDS-LDA) based on hybrid human-AI (H-AI). The basic idea is that the distribution of words within a topic or among different topics has a great influence on the selection of topic expression words. If a word is evenly distributed among all documents of a certain topic, it indicates that the word is the common word of all documents in the topic, and it is more suitable to represent this topic. If a word is more evenly distributed among various topics, it indicates that the word is a common word of all topics, and cannot be used for the purpose of distinguishing among topics, becoming less suitable to represent any topic. At the same time, the human cognitive ability and cognitive models are introduced into topic representation based on H-AI. We introduce the user's modification of topic expression words into the topic model representation so that the topic model can learn human wisdom and become more and more accurate. Therefore, three different weights are introduced: inside weight; outside weight; and manual adjustment weight. The inside weight describes the uniform distribution of a word in the given topic, the outside weight describes the uniform distribution of a word in all topics, and the manual adjustment weight reflects whether a word is suitable as a representative vocabulary in the past manual adjustment. Tests using Sina microblog's actual data sets show that the WDS-LDA algorithm makes the representative words more important, the distinction among different topic words higher, and effectively improves the precision of subsequent algorithms, such as topic detection and topic evolutionary analysis using the topic model.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have