Collaborative data analytics towards prediction on pathogen-host protein-protein interactions

Huaming Chen,Jiangning Song,Lei Wang,Jun Shen

doi:10.1109/cscwd.2017.8066706

Abstract

Nowadays more and more data are being sequenced and accumulated in system biology, which brings the data analytics researchers to a brand new era, namely ‘big data’, to extract the inner relationship and knowledge from the huge amount of data. Bridging the gap between computational methodology and biology to accelerate the development of biology analytics has been a hot area. In this paper, we focus on these enormous amounts of data generated with the speedy development of high throughput technologies during the past decades, especially for protein-protein interactions, which are the critical molecular process in biology. Since pathogen-host protein-protein interactions are the major and basic problems for not only infectious diseases but also drug design, molecular level interactions between pathogen and host play very critical role for the study of infection mechanisms. In this paper, we built a basic framework for analyzing the specific problems about pathogen-host protein-protein interactions (PHPPI), meanwhile, we also presented the state-of-art deep learning method results on prediction of PHPPI comparing with other machine learning methods. Utilizing the evaluation methods, specifically by considering the high skewed imbalanced ratio and huge amount of data, we detailed the pipeline solution on both storing and learning for PHPPI. This work contributes as a basis for a further investigation of protein and protein-protein interactions, with the collaboration of data analytics results from the vast amount of data dispersedly available in biology literature.

Full Text