Chinese Hot Topic Extraction Based on Web Log

Junhua Li,Zhen Liu,Yan Fu,Li She

doi:10.1109/wism.2009.29

Abstract

Traditional topic extraction methods only take text document into account and ignore user’s contribution in the process of extraction. But it occurs to us that the browsing status of users in one topic plays a more important role in indicating whether this topic is currently hot than the properties of text document. So in this paper, we bring forward a method of extracting “Chinese hot topic” from a set of text document downloaded from the Internet according to the web log. There are three major steps. Firstly, we get all corrective user information and the textual materials from web according to the web log. Secondly, we extract the hot terms of each web page, computing hotness of theme based on click-through rate and the forgetting factor. Finally, we form hot topics by merging correlative themes on the basis of common hot terms. It can deal with massive textual data with high efficiency and brings a new angle from the users in determining whether a topic is hot or not. We test our method on some data from several portal sites, and find that it detects the topics with highest hotness efficiently.

Full Text