A three-phase mapreduce-based algorithm for searching biomedical document databases

Milana Grbić

doi:10.7251/ijeec1901001g

Abstract

Retrieving information from large document databases is in the focus of scientific research in recent years. In this paper, a parallel algorithm for searching biomedical documents based on the MapReduce technique is presented. The algorithm consists of three phases: preprocessing phase, document representation phase, and searching phase. In the first phase, lemmatization and elimination of stop words are performed. In the second phase, each of the documents is represented as a list of pairs (word, tf-idf index of the word). The third phase represents the main searching procedure. It uses a specially designed ranking criterion, which is based on a combination of the term frequency - inverse document frequency (tf-idf) index and the indicator function for each query word. Four different versions of ranking criteria are proposed and analyzed. The algorithm performances are tested on different subsets of the large and well-known PubMed biomedical document database. The results obtained by the experiments indicate that the proposed parallel algorithm succeeds in finding high-quality results in a reasonable time. Comparing to the sequential variant of the algorithm, the experiments show that the parallel algorithm is more efficient since it finds high-quality solutions in significantly less time.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A three-phase mapreduce-based algorithm for searching biomedical document databases

Abstract

Talk to us

Similar Papers

More From: IJEEC - INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTING

Lead the way for us

Journal: IJEEC - INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTING	Publication Date: Jul 29, 2019
Citations: 1

Similar Papers

A parallel two-list algorithm for the knapsack problem
Der-Chyuan Lou ... Chin-Chen Chang
Parallel Computing | VOL. 22
Der-Chyuan Lou, et. al.Der-Chyuan Lou ... Chin-Chen Chang
01 Mar 1997
Parallel Computing | VOL. 22

<title>Performance evaluation of parallel thinning algorithms based on PRAM model</title>
Phill-Kyu Rhee ... Che-Woo La
-
Phill-Kyu Rhee, et. al.Phill-Kyu Rhee ... Che-Woo La
19 Sep 1997
19 Sep 1997

Impact of interconnection networks in a massively parallel FPGA architecture on a parallel reduction algorithm
Mouna Baklouti ... Philippe Marquet
-
Mouna Baklouti, et. al.Mouna Baklouti ... Philippe Marquet
01 Dec 2008
01 Dec 2008

A novel approach and hybrid parallel algorithms for solving the fixed charge transportation problem
Ahmed Lahjouji El Idrissi ... Ahmad El Allaoui
Radioelectronic and Computer Systems | VOL. -
Ahmed Lahjouji El Idrissi, et. al.Ahmed Lahjouji El Idrissi ... Ahmad El Allaoui
29 Sep 2023
Radioelectronic and Computer Systems | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A three-phase mapreduce-based algorithm for searching biomedical document databases

Abstract

Talk to us

Similar Papers

More From: IJEEC - INTERNATIONAL JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTING