Abstract

Functional protein annotation is an important matter for in vivo and in silico biology. Several computational methods have been proposed that make use of a wide range of features such as motifs, domains, homology, structure and physicochemical properties. There is no single method that performs best in all functional classification problems because information obtained using any of these features depends on the function to be assigned to the protein. In this study, we portray a novel approach that combines different methods to better represent protein function. First, we formulated the function annotation problem as a classification problem defined on 300 different Gene Ontology (GO) terms from molecular function aspect. We presented a method to form positive and negative training examples while taking into account the directed acyclic graph (DAG) structure and evidence codes of GO. We applied three different methods and their combinations. Results show that combining different methods improves prediction accuracy in most cases. The proposed method, GOPred, is available as an online computational annotation tool (http://kinaz.fen.bilkent.edu.tr/gopred).

Highlights

  • Due to advances in genome sequencing techniques during the last decade, the number of proteins being identified is exponentially increasing

  • We developed a method to prepare training data for the terms defined in Gene Ontology (GO) framework

  • We present a way of establishing positive and negative training data for each class based on evidence codes provided by the GO annotation (GOA) project and by considering the structure of the GO directed acyclic graph (DAG)

Read more

Summary

Introduction

Due to advances in genome sequencing techniques during the last decade, the number of proteins being identified is exponentially increasing. Functional annotation of proteins has become one of the central problems in molecular biology. Annotations of the highest-scoring hits, according to a similarity calculation, are transfered onto the target protein. This track can be called the transfer approach. Despite some known drawbacks such as excessive transfering of annotations, low sensitivity, low specificity, and propagation of database errors, this track is the most widely used among biologists because as it is historically the first successful method but developed when the number of protein sequences in the databases was much lower than today’s [1,2,3,4,5,6], it is well understood and widely used by the experimentalists

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.