Abstract

BackgroundThe detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions. Various methods, such as Evolutionary Trace, have been developed based on this strategy. In such approaches, the conserved residues are identified through comparisons of homologous amino acid sequences. Therefore, the selection of homologous sequences is a critical step. It is empirically known that a certain degree of sequence divergence in the set of homologous sequences is required for the identification of conserved residues. However, the development of a method to select homologous sequences appropriate for the identification of conserved residues has not been sufficiently addressed. An objective and general method to select appropriate homologous sequences is desired for the efficient prediction of functional regions.ResultsWe have developed a novel index to select the sequences appropriate for the identification of conserved residues, and implemented the index within our method to predict the functional regions of a protein. The implementation of the index improved the performance of the functional region prediction. The index represents the degree of conserved residue clustering on the tertiary structure of the protein. For this purpose, the structure and sequence information were integrated within the index by the application of spatial statistics. Spatial statistics is a field of statistics in which not only the attributes but also the geometrical coordinates of the data are considered simultaneously. Higher degrees of clustering generate larger index scores. We adopted the set of homologous sequences with the highest index score, under the assumption that the best prediction accuracy is obtained when the degree of clustering is the maximum. The set of sequences selected by the index led to higher functional region prediction performance than the sets of sequences selected by other sequence-based methods.ConclusionsAppropriate homologous sequences are selected automatically and objectively by the index. Such sequence selection improved the performance of functional region prediction. As far as we know, this is the first approach in which spatial statistics have been applied to protein analyses. Such integration of structure and sequence information would be useful for other bioinformatics problems.

Highlights

  • The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions

  • Our functional region prediction method consists of two steps, the selection of appropriate homologous sequences and the detection of conserved residue clusters on a structure

  • In order to evaluate the performance of the degree of spatial autocorrelation (DSPAC)-based method, we compared the average F-score of the predictions with the set of sequences collected by the DSPAC-based method and those of two other sequence selection methods without structure information, Naïve approaches and SDPfox [40]

Read more

Summary

Introduction

The detection of conserved residue clusters on a protein structure is one of the effective strategies for the prediction of functional protein regions Various methods, such as Evolutionary Trace, have been developed based on this strategy. One of the most effective strategies is the detection of conserved residue clusters on the tertiary structure of the protein [2,3,4,5,6,7,8,9] Various methods, such as Evolutionary Trace [2], PatchFinder [10,11] and ConSurf [5], have been developed based on this strategy. The clusters of conserved residues on the structure are predicted as the functional regions In such approaches, a problem has remained; that is, how to select the appropriate homologous sequences for the identification of conserved residues. An objective criterion for the divergence is required to achieve high functional region prediction performance

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call