Set-based vector model

Bruno Pôssas,Berthier Ribeiro-Neto,Nivio Ziviani,Wagner Meira

doi:10.1145/1095872.1095874

Abstract

This work presents a new approach for ranking documents in the vector space model. The novelty lies in two fronts. First, patterns of term co-occurrence are taken into account and are processed efficiently. Second, term weights are generated using a data mining technique called association rules. This leads to a new ranking mechanism called the set-based vector model . The components of our model are no longer index terms but index termsets, where a termset is a set of index terms. Termsets capture the intuition that semantically related terms appear close to each other in a document. They can be efficiently obtained by limiting the computation to small passages of text. Once termsets have been computed, the ranking is calculated as a function of the termset frequency in the document and its scarcity in the document collection. Experimental results show that the set-based vector model improves average precision for all collections and query types evaluated, while keeping computational costs small. For the 2-gigabyte TREC-8 collection, the set-based vector model leads to a gain in average precision figures of 14.7% and 16.4% for disjunctive and conjunctive queries, respectively, with respect to the standard vector space model. These gains increase to 24.9% and 30.0%, respectively, when proximity information is taken into account. Query processing times are larger but, on average, still comparable to those obtained with the standard vector model (increases in processing time varied from 30% to 300%). Our results suggest that the set-based vector model provides a correlation-based ranking formula that is effective with general collections and computationally practical.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Set-based vector model

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Information Systems

Lead the way for us

Journal: ACM Transactions on Information Systems	Publication Date: Oct 1, 2005
Citations: 39

Similar Papers

Enhancing the Set-Based Model Using Proximity Information
Bruno Pôssas ... Wagner Meira
-
Bruno Pôssas, et. al.Bruno Pôssas ... Wagner Meira
01 Jan 2002
01 Jan 2002

Hybrid Term Indexing for Weighted Boolean and Vector Space Models
Ken C W Chow ... Robert W P Luk
International Journal of Computer Processing of Languages | VOL. 14
Ken C W Chow, et. al.Ken C W Chow ... Robert W P Luk
01 Jun 2001
International Journal of Computer Processing of Languages | VOL. 14

Hybrid term indexing for different IR models
Ken C W Chow ... K L Kwok
-
Ken C W Chow, et. al.Ken C W Chow ... K L Kwok
01 Nov 2000
01 Nov 2000

Selective Flexibility of Side‐Chain Residues Improves VEGFR‐2 Docking Score using AutoDock Vina
Rui M. V. Abreu ... Maria‐João R. P. Queiroz
Chemical Biology & Drug Design | VOL. 79
Rui M. V. Abreu, et. al.Rui M. V. Abreu ... Maria‐João R. P. Queiroz
30 Jan 2012
Chemical Biology & Drug Design | VOL. 79

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Set-based vector model

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Information Systems