MapReduce‐based storage and indexing for big health data

Gayathiri N.R,Natarajan A.M

doi:10.1002/cpe.4854

Abstract

SummaryLocality Sensitive Hashing (LSH) uses randomized method to alleviate Nearest Neighbor Search issue in high dimensional spaces. However, handling of big dataset samples for LSH algorithm becomes difficult task because of computational complexity. So, the major aim of this work is to introduce a new LSH algorithm with Hadoop MapReduce framework for enhancing proficiency of arbitrary reads over big dataset samples. The proposed Hash index improves efficiency by reducing the amount of accessing data for range queries by creating buckets based on hyperplanes. A LSH on MapReduce is developed, which decreases the random data access time among map and reduce functions, in addition, it enhances proficiency. Lastly, with the aim of validating the performance of presented index for search query in MapReduce, five performance metrics such as changing cluster size, LSH for Bucket size Balancing, the overlapped boundary of a hyperplane, Bucket creation based on the configured capacity, and non‐indexed, Hash index, and global indexed dataset on the HDFS configured capacity are utilized. The effect of these metrics on dataset on the HDFS configured capacity for the period of map and reduce functions as well depicts the pre‐eminence of the presented Hash index.

Full Text