Data retrieval in cancer documents using various weighting schemes

Danie A Nicholas,Devi Jayanthila

doi:10.26634/jit.12.4.20365

Abstract

In the realm of data retrieval, sparse vectors serve as a pivotal representation for both documents and queries, where each element in the vector denotes a word or phrase from a predefined lexicon. In this study, multiple scoring mechanisms are introduced aimed at discerning the significance of specific terms within the context of a document extracted from an extensive textual dataset. Among these techniques, the widely employed method revolves around inverse document frequency (IDF) or Term Frequency-Inverse Document Frequency (TF-IDF), which emphasizes terms unique to a given context. Additionally, the integration of BM25 complements TF-IDF, sustaining its prevalent usage. However, a notable limitation of these approaches lies in their reliance on near-perfect matches for document retrieval. To address this issue, researchers have devised latent semantic analysis (LSA), wherein documents are densely represented as low-dimensional vectors. Through rigorous testing within a simulated environment, findings indicate a superior level of accuracy compared to preceding methodologies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Data retrieval in cancer documents using various weighting schemes

Abstract

Talk to us

Similar Papers

More From: i-manager's Journal on Information Technology

Lead the way for us

Similar Papers

An improved topic relevance algorithm for focused crawling
Hong-Wei Hao ... Xu-Cheng Yin
-
Hong-Wei Hao, et. al.Hong-Wei Hao ... Xu-Cheng Yin
01 Oct 2011
01 Oct 2011

A Comparative Study of Statistical Feature Reduction Methods for Arabic Text Categorization
Fouzi Harrag ... Abdul Malik S Al-Salman
-
Fouzi Harrag, et. al.Fouzi Harrag ... Abdul Malik S Al-Salman
01 Jan 2009
01 Jan 2009

Toward an enhanced Arabic text classification using cosine similarity and Latent Semantic Indexing
Fawaz S Al-Anzi ... Dia Abuzeina
Journal of King Saud University - Computer and Information Sciences | VOL. 29
Fawaz S Al-Anzi, et. al.Fawaz S Al-Anzi ... Dia Abuzeina
08 Apr 2016
Journal of King Saud University - Computer and Information Sciences | VOL. 29

Latent Semantic Analysis Boosted Convolutional Neural Networks for Document Classification
Eren Gultepe ... Mehran Kamkarhaghighi
-
Eren Gultepe, et. al.Eren Gultepe ... Mehran Kamkarhaghighi
01 Nov 2018
01 Nov 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data retrieval in cancer documents using various weighting schemes

Abstract

Talk to us

Similar Papers

More From: i-manager's Journal on Information Technology