Abstract

The aim of this work is to use a supervised learning approach to identify sets of motif-based sequence characteristics, combinations of which can give the most accurate annotation of new proteins. We assess several of InterPro Consortium member databases for their informativeness for the annotation of full-length protein sequences. Thus, our study addresses the problem of integrating biological information from various resources. Decision-rule algorithms are used to cross-map different biological classification systems in order to optimise the process of functional annotation of protein sequences. Various features (e.g., keywords, GO terms, structural complex names) may be assigned to a sequence via its characteristics (e.g., motifs built by various protein sequence analysis methods) with the developed approach. We chose SwissProt keywords as the set of features on which to perform our analysis. From the presented results one can quickly obtain the best combinations of methods appropriate for the description of a given class of proteins.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call