Abstract

Search engine query log mining has evolved over time to more like data stream mining due to the endless and continuous sequence of queries known as query stream. In this paper, we propose an online frequent sequence discovery (OFSD) algorithm to extract frequent phrases from within query streams, based on a new frequency rate metric, which is suitable for query stream mining. OFSD is an online, single pass, and real-time frequent sequence miner appropriate for data streams. The frequent phrases extracted by the OFSD algorithm are used to guide novice Web search engine users to complete their search queries more efficiently. YourEye, our online phrase recommender is then introduced. The advantages of YourEye compared with Google Suggest, a service powered by Google for phrase suggestion, is also described. Various characteristics of two specific Web search engine query logs are analyzed and then the query logs are used to evaluate YourEye. The experimental results confirm the significant benefit of monitoring frequent phrases within the queries instead of the whole queries because none-separable items. The number of the monitored elements substantially decreases, which results in smaller memory consumption as well as better performance. Re-ranking the retrieved pages based on past users clicks for each frequent phrase extracted by OFSD is also introduced. The preliminary results show the advantages of the proposed method compared to the similar work reported in Smyth et al.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.