Improvement of vector space information retrieval model based on supervised learning

Xiaoying Tai,Yasuhito Tanaka,Minoru Sasaki,Kenji Kita

doi:10.1145/355214.355224

Abstract

This paper proposes and method to improve retrieval performance of the vector space model (VSM) by utilizing user-supplied information of those documents that are relevant to the query in question. In addition to the user's relevance feedback information, incorporated into the retrieval model, which is built by using a sequence of linear transformations, is information such as inter-document similarity values. Then, the high-dimensional and sparse vectors are reduced by SVD (Singular Value Decomposition) and transformed into the low-dimensional vector space, namely the space representing the latent semantic meanings of the words. The method was experimented on through two test collections, Medline collection and Cranfield collection. Improvement of average precision compared with LSI (Latent Semantic Indexing) model were 4.03% (Medline) and 24.87% (Cranfield) for the two training data sets, and 0.01% (Medline) and 4.89% (Cranfield) for the test data, respectively. The proposed method provides an approach that makes it possible to preserve the user-supplied relevance information for a long term in the system and to use the information later.

Full Text