Implementation of a Noise Filter for Grouping in Bibliographic Databases using Latent Semantic Indexing

Murilo Marques Armelin Gomes,Anderson Santos,Sandeep Tiwari,William Ferreira Dos Anjos,Preetam Ghosh,Vasco Azevedo,Arun Kumar Jaiswal,Debmalya Barh

doi:10.61797/ijbic.v2i1.208

Abstract

Clustering algorithms can assist in scientific research by presenting themes related to some topics from which we can extract information more easily. However, it is common for many of these clusters to have documents that have no relevance to the topic of interest, thereby reducing the quality of the information. We can manage the reduced quality of information of clusters for a bibliographic database by dealing with noise in the semantic space that represents the relations between the grouped documents. In this work, we sustain the hypothesis of using the Latent Semantic Indexing (LSI) technique as an efficient instrument to reduce noise and promote better group quality. Using a database of 90 scientific publications from different areas, we pre-processed the documents by LSI and grouped them using six clustering algorithms. The results were significantly improved compared to our initial results that did not use LSI-based pre-processing. From the perspective of individual performance of the algorithms demonstrating the best results, CMeans was the one that got the highest average gain, with approximately 25%, followed by K-Means and SKmeans, with 17% each; PAM, with 16.5%; and EM, with 15%. The conclusion is that Latent Semantic Indexing has proven to be a helpful tool for noise reduction. We recommend its use to improve the cluster quality of bibliographic databases significantly.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Implementation of a Noise Filter for Grouping in Bibliographic Databases using Latent Semantic Indexing

Abstract

Talk to us

Similar Papers

More From: International Journal of Bioinformatics and Intelligent Computing

Lead the way for us

Journal: International Journal of Bioinformatics and Intelligent Computing	Publication Date: Feb 27, 2023
License type: cc-by-nc

Similar Papers

Automated Detection of Strategies in Free Text Responses
Anthony Harrison ... Celestine Cookson
-
Anthony Harrison, et. al.Anthony Harrison ... Celestine Cookson
24 Apr 2019
24 Apr 2019

Applying LSI and Data Reduction to XML for Counter Terrorism
S Demurjian ... I Greenshields
-
S Demurjian, et. al.S Demurjian ... I Greenshields
24 Jul 2006
24 Jul 2006

Bridging the theoretical gap between semantic representation models without the pressure of a ranking: some lessons learnt from LSA.
Guillermo Jorge-Botana ... José María Luzón
Cognitive Processing | VOL. 21
Guillermo Jorge-Botana, et. al.Guillermo Jorge-Botana ... José María Luzón
25 Sep 2019
Cognitive Processing | VOL. 21

A Study of Semi-discrete Matrix Decomposition for LSI in Automated Text Categorization
Wang Qiang ... Wang Xiaolong
-
Wang Qiang, et. al.Wang Qiang ... Wang Xiaolong
01 Jan 2004
01 Jan 2004

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Implementation of a Noise Filter for Grouping in Bibliographic Databases using Latent Semantic Indexing

Abstract

Talk to us

Similar Papers

More From: International Journal of Bioinformatics and Intelligent Computing