Abstract

BackgroundConserved protein sequence regions are extremely useful for identifying and studying functionally and structurally important regions. By means of an integrated analysis of large-scale protein structure and sequence data, structural features of conserved protein sequence regions were identified.ResultsHelices and turns were found to be underrepresented in conserved regions, while strands were found to be overrepresented. Similar numbers of loops were found in conserved and random regions.ConclusionThese results can be understood in light of the structural constraints on different secondary structure elements, and their role in protein structural stabilization and topology. Strands can tolerate fewer sequence changes and nonetheless keep their specific shape and function. They thus tend to be more conserved than helices, which can keep their shape and function with more changes. Loop behavior can be explained by the presence of both constrained and freely changing loops in proteins. Our detailed statistical analysis of diverse proteins links protein evolution to the biophysics of protein thermodynamic stability and folding. The basic structural features of conserved sequence regions are also important determinants of protein structure motifs and their function.

Highlights

  • Conserved protein sequence regions are extremely useful for identifying and studying functionally and structurally important regions

  • Conserved regions include more beta strands than expected and fewer alpha helices and turns The relation between structure and conserved sequence features was examined by establishing the secondary structure element (SSE) distribution in either conserved or random sequence regions

  • Conserved regions marked as a cross, and random regions marked as a square with a prediction interval

Read more

Summary

Introduction

Conserved protein sequence regions are extremely useful for identifying and studying functionally and structurally important regions. Natural choices are generically defined protein families [1], ungapped protein sequence motifs (blocks) that separate proteins into either conserved or random regions [2], and the four basic secondary structure elements (SSEs), namely, alpha helices, beta strands, structured turns, and loops [3]. The unit counted was the appearance of each SSE, regardless of its length This unit was chosen in order to avoid inconsistencies in defining the ends of SSEs, and to decrease the effect of the different lengths of the analyzed conserved regions. Each SSE type was tested separately in order to analyze its contribution to the observed difference SSEs between conserved and random regions. The differences are not very large – between 6.7% and 7.5% – but they are significant

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.