Burstiness in Query Log: Web Search Analysis by Combining Global and Local Evidences

Chen Zhang,Peiguang Lin,Chen Lei,Sen Zhang

doi:10.1109/icde.2018.00157

Abstract

Web search analysis plays a critical role in improving the performance of cutting-edge search engines. Most of the existing models, such as the click graph and its variants, focus on utilizing the wisdom of the crowd. However, how to design a model supporting both the collective wisdom as well as the unique characteristic of individuals is rarely studied. In this paper, our goal is to solve the new problem of user-specific web search analysis. We go beyond click graph and propose two probabilistic topic models, Topic Independence Model(TIM) and Topic Dependence Model (TDM). TIM adopts an assumption that the generation of query terms and URLs are topically independent; TDM captures the coupling between search queries and URLs. We also capture the temporal burstiness of topics by utilizing the continuous Beta distribution. Through a large-scale analysis of a real-life search query log, we observe that each user's web search trail enjoys multiple kinds of user-based unique characteristics. On a massive search query log, the new models achieve a better held-out likelihood than standard LDA, DCMLDA and TOT, and they can also effectively reveal the latent evolutions of topics on the corpus level and user-based level.

Full Text