A Distributed Framework for Large-scale Protein-protein Interaction Data Analysis and Prediction Using MapReduce

Lun Hu,Shicheng Yang,Huaqiang Yuan,Xin Luo,Khaled Sedraoui,Mengchu Zhou

doi:10.1109/jas.2021.1004198

Abstract

Protein-protein interactions are of great significance for human to understand the functional mechanisms of proteins. With the rapid development of high-throughput genomic technologies, massive protein-protein interaction (PPI) data have been generated, making it very difficult to analyze them efficiently. To address this problem, this paper presents a distributed framework by reimplementing one of state-of-the-art algorithms, i.e., CoFex, using MapReduce. To do so, an in-depth analysis of its limitations is conducted from the perspectives of efficiency and memory consumption when applying it for large-scale PPI data analysis and prediction. Respective solutions are then devised to overcome these limitations. In particular, we adopt a novel tree-based data structure to reduce the heavy memory consumption caused by the huge sequence information of proteins. After that, its procedure is modified by following the MapReduce framework to take the prediction task distributively. A series of extensive experiments have been conducted to evaluate the performance of our framework in terms of both efficiency and accuracy. Experimental results well demonstrate that the proposed framework can considerably improve its computational efficiency by more than two orders of magnitude while retaining the same high accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Distributed Framework for Large-scale Protein-protein Interaction Data Analysis and Prediction Using MapReduce

Abstract

Talk to us

Similar Papers

More From: IEEE/CAA Journal of Automatica Sinica

Lead the way for us

Journal: IEEE/CAA Journal of Automatica Sinica	Publication Date: Jan 1, 2022
Citations: 56

Similar Papers

HAPPI: an online database of comprehensive human annotated and predicted protein interactions
Jake Chen ... Sudharani Mamidipalli
BMC Genomics | VOL. 10
Jake Chen, et. al.Jake Chen ... Sudharani Mamidipalli
01 Jan 2009
BMC Genomics | VOL. 10

PRINCESS, a Protein Interaction Confidence Evaluation System with Multiple Data Sources
Dong Li ... Fuchu He
Molecular & Cellular Proteomics | VOL. 7
Dong Li, et. al.Dong Li ... Fuchu He
01 Jun 2008
Molecular & Cellular Proteomics | VOL. 7

Categorizing Biases in High-Confidence High-Throughput Protein-Protein Interaction Data Sets
Xueping Yu ... Anders Wallqvist
Molecular & Cellular Proteomics | VOL. 10
Xueping Yu, et. al.Xueping Yu ... Anders Wallqvist
29 Aug 2011
Molecular & Cellular Proteomics | VOL. 10

HAPPI-2: a Comprehensive and High-quality Map of Human Annotated and Predicted Protein Interactions
Jake Y Chen ... Thanh M Nguyen
BMC Genomics | VOL. 18
Jake Y Chen, et. al.Jake Y Chen ... Thanh M Nguyen
17 Feb 2017
BMC Genomics | VOL. 18

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Distributed Framework for Large-scale Protein-protein Interaction Data Analysis and Prediction Using MapReduce

Abstract

Talk to us

Similar Papers

More From: IEEE/CAA Journal of Automatica Sinica