An iterative approach of protein function prediction

Xiaoxiao Chi,Jingyu Hou

doi:10.1186/1471-2105-12-437

Abstract

BackgroundCurrent approaches of predicting protein functions from a protein-protein interaction (PPI) dataset are based on an assumption that the available functions of the proteins (a.k.a. annotated proteins) will determine the functions of the proteins whose functions are unknown yet at the moment (a.k.a. un-annotated proteins). Therefore, the protein function prediction is a mono-directed and one-off procedure, i.e. from annotated proteins to un-annotated proteins. However, the interactions between proteins are mutual rather than static and mono-directed, although functions of some proteins are unknown for some reasons at present. That means when we use the similarity-based approach to predict functions of un-annotated proteins, the un-annotated proteins, once their functions are predicted, will affect the similarities between proteins, which in turn will affect the prediction results. In other words, the function prediction is a dynamic and mutual procedure. This dynamic feature of protein interactions, however, was not considered in the existing prediction algorithms.ResultsIn this paper, we propose a new prediction approach that predicts protein functions iteratively. This iterative approach incorporates the dynamic and mutual features of PPI interactions, as well as the local and global semantic influence of protein functions, into the prediction. To guarantee predicting functions iteratively, we propose a new protein similarity from protein functions. We adapt new evaluation metrics to evaluate the prediction quality of our algorithm and other similar algorithms. Experiments on real PPI datasets were conducted to evaluate the effectiveness of the proposed approach in predicting unknown protein functions.ConclusionsThe iterative approach is more likely to reflect the real biological nature between proteins when predicting functions. A proper definition of protein similarity from protein functions is the key to predicting functions iteratively. The evaluation results demonstrated that in most cases, the iterative approach outperformed non-iterative ones with higher prediction quality in terms of prediction precision, recall and F-value.

Highlights

Current approaches of predicting protein functions from a protein-protein interaction (PPI) dataset are based on an assumption that the available functions of the proteins (a.k.a. annotated proteins) will determine the functions of the proteins whose functions are unknown yet at the moment (a.k.a. un-annotated proteins)
The results demonstrated that the overall performance of our iterative prediction algorithm Cosine Iterative Algorithm (CIA) was better than the other algorithms for both original and new definitions of precision and recall
Our experiments showed that the current method of determining the value of k, i.e. the value of k is the average number of functions each protein has in the neighbour, achieved the best prediction results compared with other methods we tried

Summary

Introduction

Current approaches of predicting protein functions from a protein-protein interaction (PPI) dataset are based on an assumption that the available functions of the proteins (a.k.a. annotated proteins) will determine the functions of the proteins whose functions are unknown yet at the moment (a.k.a. un-annotated proteins). The early Neighbour Counting method proposed by Schwikowski et al [7] annotated an un-annotated protein with the functions that occurred most frequently among its neighbour proteins This method could be considered as a simple similarity-based prediction method as it assigned similarity 1 (100%) to two proteins that have an interaction, or 0 if these two proteins have no interaction. Brun et al [9] improved the neighbour counting method by using a measure in graph theory to assign weights to the edges of a PPI network, and used the weights as the similarities when predicting functions. In this method, the similarity was not 1 or 0 only anymore, it was within the range [0,1] instead. With the protein and protein function similarities, some methods were proposed to incorporate these similarities into the prediction, such as the k-Nearest Neighbour (kNN) based methods in [16]

Methods

Results

Discussion

Conclusion