HDInsight4PSi: Boosting performance of 3D protein structure similarity searching with HDInsight clusters in Microsoft Azure cloud

Dariusz Mrozek,Paweł Daniłowicz,Bożena Małysiak-Mrozek

doi:10.1016/j.ins.2016.02.029

Abstract

3D protein structure similarity searching is one of the important processes performed in structural bioinformatics, since it allows for protein function identification and reconstruction of phylogeny for weakly related organisms. Due to the complexity of 3D protein structures and exponential growth of protein structures in public repositories, like the Protein Data Bank, the process is time-consuming and requires increased computational resources. This causes the necessity to prepare computer systems to be able to deal with such huge volumes of macromolecular data.In this paper, we show how 3D protein structure similarity searching can be performed in parallel by distributing MapReduce jobs on the HDInsight cluster in Microsoft Azure commercial cloud. Our solution combines the use of two important computing paradigms that gain popularity in recent years—Hadoop/MapReduce and Cloud computing. Our experiments performed with the use of the whole repository of protein structures from Protein Data Bank confirm that such a technological fusion is very beneficial and can be successfully applied when performing time-consuming computations over biological data. Moreover, appropriate preparation of data allows to reduce the time needed for computations and significantly accelerates the similarity searching.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

HDInsight4PSi: Boosting performance of 3D protein structure similarity searching with HDInsight clusters in Microsoft Azure cloud

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Feb 21, 2016
Citations: 29

Similar Papers

Scaling 3D Protein Structure Similarity Searching on Large Hadoop Clusters Located in a Public Cloud
Dariusz Mrozek
-
Dariusz MrozekDariusz Mrozek
01 Jan 2018
01 Jan 2018

Accelerating 3D Protein Structure Similarity Searching on Microsoft Azure Cloud with Local Replicas of Macromolecular Data
Dariusz Mrozek ... Tomasz Kutyła
-
Dariusz Mrozek, et. al.Dariusz Mrozek ... Tomasz Kutyła
01 Jan 2015
01 Jan 2015

Pocketome via Comprehensive Identification and Classification of Ligand Binding Envelopes
Jianghong An ... Ruben Abagyan
Molecular & Cellular Proteomics | VOL. 4
Jianghong An, et. al.Jianghong An ... Ruben Abagyan
01 Jun 2005
Molecular & Cellular Proteomics | VOL. 4

Efficient 3D Protein Structure Alignment on Large Hadoop Clusters in Microsoft Azure Cloud
Bożena Małysiak-Mrozek ... Dariusz Mrozek
-
Bożena Małysiak-Mrozek, et. al.Bożena Małysiak-Mrozek ... Dariusz Mrozek
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HDInsight4PSi: Boosting performance of 3D protein structure similarity searching with HDInsight clusters in Microsoft Azure cloud

Abstract

Talk to us

Similar Papers

More From: Information Sciences