Hybrid Term Indexing for Weighted Boolean and Vector Space Models

Ken C W Chow,K L Kwok,K F Wong,Robert W P Luk

doi:10.1142/s0219427901000345

Abstract

Retrieval effectiveness depends on how terms are extracted and indexed. For Chinese text (and others like Japanese and Korean), there are no space to delimit words. Indexing using hybrid terms (i.e. words and bigrams) were able to achieve the best precision amongst homogenous terms at a lower storage cost than indexing with bigrams. However, this was tested with conjunctive queries. Here, we extended the weighted Boolean models using fuzzy and p-norm measures, as well as the vector space model using the cosine measure, for processing hybrid terms. Our evaluation shows that all IR models using hybrid terms achieve better average precision over those using words. Across different recall values, the weighted Boolean model using fuzzy measures with hybrid terms achieve consistently about 8% higher than those using words. The vector space model using the cosine measures with hybrid terms achieved the best improvement in the average recall and precision.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Hybrid Term Indexing for Weighted Boolean and Vector Space Models

Abstract

Talk to us

Similar Papers

More From: International Journal of Computer Processing of Languages

Lead the way for us

Journal: International Journal of Computer Processing of Languages	Publication Date: Jun 1, 2001
Citations: 18

Similar Papers

Hybrid term indexing for different IR models
Ken C W Chow ... K L Kwok
-
Ken C W Chow, et. al.Ken C W Chow ... K L Kwok
01 Nov 2000
01 Nov 2000

N-layer Approach to Web Information Retrieval
H.B Kekre ... S.S Sane
International Journal of Applied Information Systems | VOL. 5
H.B Kekre, et. al.H.B Kekre ... S.S Sane
10 Jan 2013
International Journal of Applied Information Systems | VOL. 5

Text summarization as a decision support aid
T Elizabeth Workman ... Marcelo Fiszman
BMC Medical Informatics and Decision Making | VOL. 12
T Elizabeth Workman, et. al.T Elizabeth Workman ... Marcelo Fiszman
23 May 2012
BMC Medical Informatics and Decision Making | VOL. 12

Diagnosing and differentiating viral pneumonia and COVID-19 using X-ray images.
Hakan Kör ... Ahmet Haşim Yurttakal
Multimedia tools and applications | VOL. 81
Hakan Kör, et. al.Hakan Kör ... Ahmet Haşim Yurttakal
27 Apr 2022
Multimedia tools and applications | VOL. 81

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hybrid Term Indexing for Weighted Boolean and Vector Space Models

Abstract

Talk to us

Similar Papers

More From: International Journal of Computer Processing of Languages