Analyzing the results of automatic new topic identification

Seda Ozmutlu,Gencer C Cosar,Michael Seadle

doi:10.1108/07378830810903373

Abstract

PurposeIdentification of topic changes within a user search session is a key issue in content analysis of search engine user queries. Recently, various studies have focused on new topic identification/session identification of search engine transaction logs, and several problems regarding the estimation of topic shifts and continuations were observed in these studies. This study aims to analyze the reasons for the problems that were encountered as a result of applying automatic new topic identification.Design/methodology/approachMeasures, such as cleaning the data of common words and analyzing the errors of automatic new topic identification, are applied to eliminate the problems in estimating topic shifts and continuations.FindingsThe findings show that the resulting errors of automatic new topic identification have a pattern, and further research is required to improve the performance of automatic new topic identification.Originality/valueImproving the performance of automatic new topic identification would be valuable to search engine designers, so that they can develop new clustering and query recommendation algorithms, as well as custom‐tailored graphical user interfaces for search engine users.

Full Text