Abstract

Capturing users' information needs is essential in decreasing the barriers in information access. This paper mines sequences of actions called search scripts from search query logs which keep large-scale users' search experiences. Search scripts can be applied to guide users to satisfy their information needs, improve the search effectiveness of retrieval systems, recommend advertisements at suitable places, and so on. Information quality, query ambiguity, topic diversity, and document relevancy are four major challenging issues in search script mining. In this paper, we determine the relevance of URLs for a query, adopt the Open Directory Project (ODP) categories to disambiguate queries and URLs, explore various features and clustering algorithms for intent clustering, identify critical actions from each intent cluster to form a search script, generate a nature language description for each action, and summarize a topic for each search script. Experiments show that the complete link hierarchical clustering algorithm with the features of query terms, relevant URLs, and disambiguated ODP categories performs the best. Applying the intent clusters created by the best model to intent boundary identification achieves an $$F$$ score of 0.6666. The intent clusters then are applied to generate search scripts.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.