Abstract

The paper reviews the usage of the platform Hadoop in applications for systemic bioinformatics. Hadoop offers another system for Structural Bioinformatics to break down broad fractions of the Protein Data Bank that is crucial to high-throughput investigations of (for example) protein-ligand docking, protein-ligand complex clustering, and structural alignment. In specific, we review different applications of high-throughput analyses and their scalability in the literature using Hadoop. In comparison to revising the algorithms, we find that these organisations typically use a realized executable called MapReduce. Scalability demonstrates variable behavior in correlation with other batch schedulers, particularly as immediate examinations are usually not accessible on a similar platform. Direct Hadoop examinations with batch schedulers are missing in the literature, but we note that there is some evidence that the scale of MPI executions is better than Hadoop. The dilemma of the interface and structure of an asset to use Hadoop is a significant obstacle to the utilization of the Hadoop biological framework. This will enhance additional time as Hadoop interfaces, such as enhancing Flash, increasing the use of cloud platforms, and normalized approaches, for example, are taken up by Workflow Languages.

Highlights

  • Late advancements in molecular biology and genomics have contributed to significant innovations in digital biological knowledge

  • We are developing a suite of bioinformatics applications, Bio Hadoop, which illustrates how general-purpose parallelization innovation can be and effectively tailored to solve this class of high-performance problems and, how this innovation can be the premise of a distributed computing framework for a few computational biology issues

  • Hadoop provides three planned focus points for the investigation of broad collections of biological data. It is intended for the analysis of broad semi-organized data sets; it is intended to be a shortcoming of open mindedness which turns out to be literally inevitable for a sufficiently large number of processors; MapReduce formalism for the representation of data sets considers t Considering the relevance of computationally analysing protein structures in fields such as membrane proteins, protein-protein interactions, and their effect in system biology and protein architecture of a platform for the successful investigation of semi-organized data sets

Read more

Summary

Turkish Journal of Computer and Mathematics Education

A Review on Design and Development of Performance Evaluation Model for BioInformatics Data Using Hadoop. Ravi Kumar A a, Dr Harsh Pratap Singh b and Dr G.Anil Kumar c a Research Scholar, Dept. Of Computer Science & Engineering, Sri Satya Sai University of Technology & Medical Sciences, Sehore, Bhopal Indore Road, Madhya Pradesh, India b Research Guide, Dept. Of Computer Science & Engineering, Sri Satya Sai University of Technology & Medical Sciences, Sehore, Bhopal Indore Road, Madhya Pradesh, India cResearch Co-Guide, Dept. Of Computer Science & Engineering,Science Institute of Technology,Ibrahim Patnam, Hyderabad. Article History: Received: 11 January 2021; Accepted: 27 February 2021; Published online: 5 April 2021

Introduction
Ra rite
HDFS replication
Exe cutor
Ind uence ex File
Turkish Journal of Computer and Mathematics Education Results and discussion
Findings
Methods
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call