Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier.

Leyi Wei,Jinxiu Chen,Jiancang Zeng,Pengwei Xing,Ran Su,Fei Guo

doi:10.1016/j.artmed.2017.03.001

Abstract

Computational methods are employed in bioinformatics to predict protein-protein interactions (PPIs). PPIs and protein-protein non-interactions (PPNIs) display different levels of development, and the number of PPIs is considerably greater than that of PPNIs. This significant difference in the number of PPIs and PPNIs increases the cost of constructing a balanced dataset. PPIs can be classified as either physical or genetic. However, ready-made PPNI databases were proven only to have no physical interactions and were not proven to have no genetic interactions. Hence, ready-made PPNI databases contain false negative non-interactions. In this study, two PPNI datasets were artificially generated from a PPI database. In contrast to various traditional PPI feature extraction methods based on sequential information, two types of novel feature extraction methods were proposed. One is based on secondary structure information, and the other is based on the physicochemical properties of proteins. The experimental results of the RandomPairs dataset validate the efficiency and effectiveness of the proposed prediction model. These results reveal the potential of constructing a PPI negative dataset to reduce false negatives. Related datasets, tools, and source codes are accessible at http://lab.malab.cn/soft/PPIPre/PPIPre.html.

Full Text