Abstract

Information retrieval applications have to publish their output in the form of ranked lists. Such a requirement motivates researchers to develop methods that can automatically learn effective ranking models. Many existing methods usually perform analysis on multidimensional features of query-document pairs directly and don't take users' interactive feedback information into account. They thus incur the high computation overhead and low retrieval performance due to an indefinite query expression. In this paper, we propose a Virtual Feature based Logistic Regression (VFLR) ranking method that conducts the logistic regression on a set of essential but independent variables, called virtual features (VF). They are extracted via the principal component analysis (PCA) method with the user's relevance feedback. We then predict the ranking score of each queried document to produce a ranked list. We systematically evaluate our method using the LETOR 4.0 benchmark datasets. The experimental results demonstrate that the proposal outperforms the state-of-the-art methods in terms of the Mean Average Precision (MAP), the Precision at position k (P@k), and the Normalized Discounted Cumulative Gain at position k (NDCG@k).

Highlights

  • Ranking the tremendous candidate documents in accordance with the relevance to a query is an essential problem in the field of Information Retrieval (IR)

  • We propose an alternative ranking algorithm, called Virtual Feature based Logistic Regression (VFLR), which utilizes the user’s relevance feedback

  • We use the training dataset as input, which consists of a set of records by the form vq,d,rw, where q is a query, d is a document, such as term frequency (TF), inverse document frequency (IDF) and document length (DL) of the whole document, and r is the relevance of d to q

Read more

Summary

Introduction

Ranking the tremendous candidate documents in accordance with the relevance to a query is an essential problem in the field of Information Retrieval (IR). By leveraging labeled query-document pairs with their relevance and the machine learning algorithms, these approaches are able to make the parameter tuning of ranking model be more effective. We propose an alternative ranking algorithm, called Virtual Feature based Logistic Regression (VFLR), which utilizes the user’s relevance feedback. For a normal user who retrieves information from Internet, he can always determine which responded documents are relevant he may not express his requirement as exactly as a specialist Based on this assumption, a regression model is built by the VFLR algorithm. Veloso et al [21] develop a novel method that exploits rules in the training phase It associates document features with its relevance to the query, and uses the discovered rules to estimate the relevance score for ranking documents. Our approach is relatively simple but extremely effective, as we will show in the latter experiments

Methods
Experimental Results
X DQD 1

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.