Similarity Identification of Large-scale Biomedical Documents using Cosine Similarity and Parallel Computing

Merlinda Wibowo,Herman Yuliansyah,Christoph Quix,Nur Syahela Hussien,Faisal Dharma Adhinata

doi:10.17977/um018v4i22021p105-116

Abstract

Document similarity computation is an important research topic in information retrieval, and it is a crucial issue for automatic document categorization. The similarity value is between 0 and 1, then the closest value to 1 is represented both documents is considered more relevant, vice versa. However, the large scale of textual information has created the problem of finding the relevance level between documents. Therefore, the relevance between mesh heading text in the PubMed documents is higher than the relevance of the abstract text in the PubMed documents. Furthermore, parallel computing is implemented to speed up the large-scale documents similarity identification process that automatically calculates in the PubMed application. The execution time of mesh heading is 15.447 seconds, and the timely execution of abstract is 74.191 seconds. The execution time of mesh heading is higher than abstract because abstract contains more words than mesh heading. This study has successfully identified the similarity between large-scale biomedical documents of the PubMed documents that implemented a cosine similarity algorithm. The result has shown that the cosine similarity of the mesh heading texts is higher than the abstract text in the form of a graph and table shown in the PubMed application. The cosine similarity is useful to measure the similarity between documents based on the TF*IDF calculation result.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Similarity Identification of Large-scale Biomedical Documents using Cosine Similarity and Parallel Computing

Abstract

Talk to us

Similar Papers

More From: Knowledge Engineering and Data Science

Lead the way for us

Journal: Knowledge Engineering and Data Science	Publication Date: Feb 4, 2022
License type: CC BY-SA 4.0

Similar Papers

Evaluating keyphrase extraction algorithms for finding similar news articles using lexical similarity calculation and semantic relatedness measurement by word embedding.
Talha Bin Sarwar ... M Saef Ullah Miah
PeerJ Computer Science | VOL. 8
Talha Bin Sarwar, et. al.Talha Bin Sarwar ... M Saef Ullah Miah
07 Jul 2022
PeerJ Computer Science | VOL. 8

Evaluating cross-lingual textual similarity on dictionary alignment problem
Yiğit Sever ... Gönenç Ercan
Language Resources and Evaluation | VOL. 54
Yiğit Sever, et. al.Yiğit Sever ... Gönenç Ercan
29 Jun 2020
Language Resources and Evaluation | VOL. 54

Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen
Ade Riyani ... Muhammad Zidny Naf'An
Jurnal Linguistik Komputasional (JLK) | VOL. 2
Ade Riyani, et. al.Ade Riyani ... Muhammad Zidny Naf'An
26 Mar 2019
Jurnal Linguistik Komputasional (JLK) | VOL. 2

A Comparison of Semantic Similarity Methods for Maximum Human Interpretability
Pinky Sitikhu ... Kritish Pahi
-
Pinky Sitikhu, et. al.Pinky Sitikhu ... Kritish Pahi
01 Nov 2019
01 Nov 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Similarity Identification of Large-scale Biomedical Documents using Cosine Similarity and Parallel Computing

Abstract

Talk to us

Similar Papers

More From: Knowledge Engineering and Data Science