Abstract

The traditional query clustering algorithms are designed to work on previously collected data from query stream. These algorithms become less and less effective with time because users' interests, query meaning and popularity of topics change over time. So, there is a need for incremental algorithms which can accommodate the concept drift that surface with new data being added to the collection without performing a complete re-clustering. We have proposed an incremental model for query and query-context aware document clustering. The model periodically updates new information efficiently and can be applied in a distributed environment. The proposed incremental model retains the quality of both query and document clusters. The proposed model can be applied to the results of hierarchical query clustering algorithms that produce query and document clusters. The model is tested on three hierarchical clustering algorithms on different datasets including TREC session track 2011 dataset. We have also experimented with the variant of the proposed incremental model for comparing the performance. The proposed model and its variant not only achieve accuracy very close to that of static models in all the experiments, but also offer a significant speedup.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call