Identifying noisy functional annotations of proteins using sparse semantic similarity

Zhiwen Yu,Xia Chen,Chang Lu,Jun Wang,Guoxian Yu

doi:10.1360/n112017-00105

Abstract

Automatically annotating functions of proteins is a key task in bioinformatics. Functional annotations of proteins are collected from multiple sources; thus, noisy annotations are inevitably introduced. However, the current research in protein function prediction almost always focuses on predicting functions for completely unannotated (or incompletely annotated) proteins, and seldom identifies the noisy annotations of proteins. In this paper, we propose a method called identifying noisy functional annotations (NFAs) of proteins using sparse semantic similarity. NFA first utilizes a protein-function association matrix to store the functional annotations of proteins, differentially weighs the annotations using the evidence codes attached with these annotations, and subsequently upward propagates the weights to the expanded annotations via the hierarchical structure among the functional labels. Next, NFA measures the semantic similarity between proteins by the $l_1$-norm regularized sparse representation on the weighted protein-function association matrix. Finally, it identifies the noisy functions of a protein based on the functions annotated to its semantic neighborhood proteins. The experimental results on two model species (A. thaliana and S. cerevisiae) show that the NFA more accurately identifies noisy annotations than other related methods. Additionally, removing the identified noisy annotations improves the accuracy of the current function prediction model.

Full Text