Abstract

BackgroundThe detection of relationships between a protein sequence of unknown function and a sequence whose function has been characterised enables the transfer of functional annotation. However in many cases these relationships can not be identified easily from direct comparison of the two sequences. Methods which compare sequence profiles have been shown to improve the detection of these remote sequence relationships. However, the best method for building a profile of a known set of sequences has not been established. Here we examine how the type of profile built affects its performance, both in detecting remote homologs and in the resulting alignment accuracy. In particular, we consider whether it is better to model a protein superfamily using a single structure-based alignment that is representative of all known cases of the superfamily, or to use multiple sequence-based profiles each representing an individual member of the superfamily.ResultsUsing profile-profile methods for remote homolog detection we benchmark the performance of single structure-based superfamily models and multiple domain models. On average, over all superfamilies, using a truncated receiver operator characteristic (ROC5) we find that multiple domain models outperform single superfamily models, except at low error rates where the two models behave in a similar way. However there is a wide range of performance depending on the superfamily. For 12% of all superfamilies the ROC5 value for superfamily models is greater than 0.2 above the domain models and for 10% of superfamilies the domain models show a similar improvement in performance over the superfamily models.ConclusionUsing a sensitive profile-profile method we have investigated the performance of single structure-based models and multiple sequence models (domain models) in detecting remote superfamily members. We find that overall, multiple models perform better in recognition although single structure-based models display better alignment accuracy.

Highlights

  • The detection of relationships between a protein sequence of unknown function and a sequence whose function has been characterised enables the transfer of functional annotation

  • Curve for the domain models is much larger and, in addition, more remote homologs are detected overall. This indicates that domain models are better at detecting remote homologs

  • Does a single profile of a protein superfamily built from structure-based alignments perform better at recognition than multiple domain models? The comparison is not straightforward and this analysis identifies some of the factors that are important in a comparison of single and multiple models using profile-profile methods

Read more

Summary

Introduction

The detection of relationships between a protein sequence of unknown function and a sequence whose function has been characterised enables the transfer of functional annotation. Methods which compare sequence profiles have been shown to improve the detection of these remote sequence relationships. We consider whether it is better to model a protein superfamily using a single structure-based alignment that is representative of all known cases of the superfamily, or to use multiple sequence-based profiles each representing an individual member of the superfamily. Annotation of gene products for newly sequenced genomes is usually done electronically by transfer of functional information from proteins that have very similar amino acid sequences. ROC curves showing number of true positives against false positives for both types of models on the test dataset. Accurate identification would enable a larger proportion of the currently sequenced genomes to have putative functional annotation

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.