Abstract

We present an automated protein function prediction system that is based on a set of homologous proteins and gene ontology categories. A novel measure based on a set of optimal local alignments is used to identify the homologues. The biological functions of the homologous proteins are characterized with gene ontology annotations. The protein function prediction is performed based on data mining models using decision trees. The tree models depict the interconnections between biological functional groups, which reflect, in certain degree, the underlying biological pathways. The models are trained and tested using the complete proteome of model organism yeast (Sacchyromyces cerevisiae). The results of this study demonstrate the variations of model accuracy and prediction accuracy from one functional group to another. The variations illustrate certain limitations of sequence similarity based protein function prediction methods. However, basic assumption of similar sequences resulting similar functions is still largely valid. The models developed outperform the methods that are solely depends on the annotations of homologous proteins, although the model is to be used as a preliminary tool for protein function prediction and the prediction results need to be verified through other means. The results show that the prediction accuracies for most of the functional groups are over 80%.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call