Abstract

BackgroundHydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. However, its efficiency and application are currently limited by the need of user expertise. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. This analysis has been extended to loop-forming clusters, using an appropriate loop alphabet.ResultsThe structural behavior of hydrophobic cluster species, which are typical of protein globular domains, was investigated within banks of experimental structures, considered at different levels of sequence redundancy. The 294 more frequent hydrophobic cluster species were analyzed with regard to their association with the different secondary structures (frequencies of association with secondary structures and secondary structure propensities). Hydrophobic cluster species are predominantly associated with regular secondary structures, and a large part (60 %) reveals preferences for α-helices or β-strands. Moreover, the analysis of the hydrophobic cluster amino acid composition generally allows for finer prediction of the regular secondary structure associated with the considered cluster within a cluster species. We also investigated the behavior of loop forming clusters, using a "PGDNS" alphabet. These loop clusters do not overlap with hydrophobic clusters and are highly associated with coils. Finally, the structural information contained in the hydrophobic structural words, as deduced from experimental structures, was compared to the PSI-PRED predictions, revealing that β-strands and especially α-helices are generally over-predicted within the limits of typical β and α hydrophobic clusters.ConclusionThe dictionary of hydrophobic clusters described here can help the HCA user to interpret and compare the HCA plots of globular protein sequences, as well as provides an original fundamental insight into the structural bricks of protein folds. Moreover, the novel loop cluster analysis brings additional information for secondary structure prediction on the whole sequence through a generalized cluster analysis (GCA), and not only on regular secondary structures. Such information lays the foundations for developing a new and original tool for secondary structure prediction.

Highlights

  • Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters

  • A substantial improvement in secondary structure prediction has been made by taking into account the evolutionary information provided by the divergence of protein sequences belonging to a same structural family (e.g. [2,3], reviewed in [4])

  • Hydrophobic clusters definition depends on three parameters: i) a representative hydrophobic alphabet, ii) an optimal helical pitch to curve the 1D amino acid sequence space in a constant way, iii) a connectivity distance, depending on the considered helical pitch and corresponding to the number of non-hydrophobic residues separating two hydrophobic clusters defined as distinct

Read more

Summary

Introduction

Hydrophobic Cluster Analysis (HCA) is an efficient way to compare highly divergent sequences through the implicit secondary structure information directly derived from hydrophobic clusters. In order to help the analysis of HCA plots, we report here the structural preferences of hydrophobic cluster species, which are frequently encountered in globular domains of proteins. These species are characterized only by their hydrophobic/non-hydrophobic dichotomy. Prediction of secondary structures is a fundamental basis for protein structure prediction This provides constraints for finding remote homologues with low sequence similarity for comparative modeling Current prediction methods generally extract information from known experimental structures and use it for predicting secondary structures in unknown sequences. Accuracy limitation may come from inconstancies between the different secondary structure assignment methods [5], and from long-range interactions, which are not considered in current predictive tools [6,7]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call