Abstract

We proposed a novel query expansion method by combining user interest and ontology. Firstly, users’ interests are described by contextual words which are generated based on ontology, and the user interest degree with respect to each contextual word is calculated. Secondly, the contextual words are organized according to ontology relevance and divided into different subsets, and each subset can be seen as a candidate suggestion set. By calculating the weight of each contextual word, we obtain the meaningful expansions for a query. Comparative experiments show that, the proposed method is superior to other methods when precision and recall measurement are used and gives personalized query suggestions to users efficiently. Introduction The effectiveness of information retrieval from the web largely depends on whether users can issue queries to search engines, which properly describe their information needs [1]. Writing queries is not very easy, because the queries are usually short and the words may be ambiguous [2, 3]. Most existing works on query expansion utilize query logs to suggest queries [4]. Generally, the web search engines have millions of users. When a user has some information needs, there always exist many users who have searched the same query before. Therefore, the search engine can use these large amounts of past usage data to offer possible query expansions [5]. Because the query submitted by the user is closely related to his interests and intents, different users who submit a same query may want to express different requirements. Effective query expansion requires inferring user’s query intent and then expanded queries that help retrieving webpages which contain the relevant information [6]. Inspired by this, we propose a method of query expansion based on user interest context and ontology. It does not depend on query logs of the whole web and utilizes only the terms occurring in the user browsed logs. The proposed method In this paper, the proposed query expansion method is executed in two steps: the user interest context mining and the query expansion. The details are given as follows. (1) User interest context mining Firstly, we execute webpage parsing to extract the main body of the webpage. Stop words are filtered out and the root of each word is extracted by using the Porter Stemming algorithm [7]. The webpage pi is represented by the vector Wi=(wi1,wi2,...wim), where wim is the term of pi; Secondly, we use the natural language processing technology to implement word sense disambiguation[8]. Further, we obtain the hypernyms of the terms which are called contextual words and denoted as Ci=(ci1,ci2,...cim) in pi through generic ontology, where, cim is the contextual word of wim. To calculate the user interest degree of contextual word, the browsed webpages are organized by the day, and each day is seen as one session. The webpages user u browsed in j-th session is denoted as Dayj. The interest degree of contextual word c is formulated as follows, denoted as I(c): ( ) log 2 ( ) 1 ( ) ( , ) ( , ) init n d d l j j j I c f c Day t c Day e a β − −

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call