Abstract

Web search analysis plays a critical role in improving the performance of cutting-edge search engines. Most of the existing models, such as the click graph and its variants, focus on utilizing the wisdom of the crowd. However, how to design a model supporting both the collective wisdom as well as the unique characteristic of individuals is rarely studied. In this paper, our goal is to solve the new problem of user-specific web search analysis. We go beyond click graph and propose two probabilistic topic models, Topic Independence Model(TIM) and Topic Dependence Model (TDM). TIM adopts an assumption that the generation of query terms and URLs are topically independent; TDM captures the coupling between search queries and URLs. We also capture the temporal burstiness of topics by utilizing the continuous Beta distribution. Through a large-scale analysis of a real-life search query log, we observe that each user's web search trail enjoys multiple kinds of user-based unique characteristics. On a massive search query log, the new models achieve a better held-out likelihood than standard LDA, DCMLDA and TOT, and they can also effectively reveal the latent evolutions of topics on the corpus level and user-based level.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.