Prediction of protein–protein interaction (PPI) types enhances the comprehension of the underlying structural characteristics and functions of proteins, which gives rise to a multi-label classification problem. The nominal features describe the physicochemical characteristics of proteins directly, establishing a more robust correlation with the interaction types between proteins than ordered features. Motivated by this, we propose a multi-label PPI prediction model referred to as CoMPPI (Co-training based Multi-Label prediction of Protein–Protein Interaction). This approach aims to maximize the utility of both ordered and nominal features extracted from protein sequences. Specifically, CoMPPI incorporates graph convolutional network (GCN) and 1D convolution operation to process the complementary subsets of features individually, leveraging both local and contextualized information in a more efficient way. In addition, two multi-type PPI datasets were constructed to eliminate the duplication in previous datasets. We compare the performance of CoMPPI with three state-of-the-art methods on three datasets partitioned using distinct schemes (Breadth-first search, Depth-first search, and Random), CoMPPI consistently outperforms the other methods across all cases, demonstrating improvements ranging from 3.81% to 32.40% in Micro-F1. The subsequent ablation experiment confirms the efficacy of employing the co-training framework for multi-label PPI prediction, indicating promising avenues for future advancements in this domain.
Read full abstract