Abstract

The traditional query clustering algorithms are designed to work on previously collected data from query stream. These algorithms become less and less effective with time because users' interests, query meaning and popularity of topics change over time. So, there is a need for incremental algorithms which can accommodate the concept drift that surface with new data being added to the collection without performing a complete re-clustering. We have proposed an incremental model for query and query-context aware document clustering. The model periodically updates new information efficiently and can be applied in a distributed environment. The proposed incremental model retains the quality of both query and document clusters. The proposed model can be applied to the results of hierarchical query clustering algorithms that produce query and document clusters. The model is tested on three hierarchical clustering algorithms on different datasets including TREC session track 2011 dataset. We have also experimented with the variant of the proposed incremental model for comparing the performance. The proposed model and its variant not only achieve accuracy very close to that of static models in all the experiments, but also offer a significant speedup.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.