Hydropathy plots or window averages over local stretches of the sequence of residue hydrophobicity have revealed patterns related to various protein tertiary structural features. This has enabled identification of regions of the sequence that are at the surface or within the interior of globular soluble proteins, regions located within the lipid bilayer of transmembrane proteins, portions of the sequence that characterize repeating motifs, as well as motifs that usefully characterize different protein structural families. This, therefore, provides one example of the generally expressed maxim that “sequence determines structure”. On the other hand, a number of previous investigations have shown the rapidly varying values of residue hydrophobicity along the sequence to be distributed approximately randomly. So one might question just how much of the sequence actually determines structure. It is, therefore, of interest to extract that part of this rapidly varying distribution of residue hydrophobicity that is responsible for the longer wavelength variations that correlate with protein tertiary structural features and to determine their prevalence within the entire distribution. This is accomplished by a finite Fourier analysis of the sequence of residue hydrophobicity and of a new measure of residue distance from the protein interior. Calculations are performed on a number of globins, immunoglobulins, cuprodoxins, and papain-like structures. The spectral power of the Fourier amplitudes of the frequencies extracted, whose inverse transforms underlie the windowed values of residue hydrophobicity is shown to be a small fraction of the total power of the hydrophobicity distribution and thereby consistent with a distribution that might appear to be predominantly random. The wide range of sequence identity between proteins having the same fold, all exhibiting similar small fractions of power amplitude that correlate with the longer wavelength inside-to- outside excursions of the amino acid residues, supports the general contention that close sequence identity is an expression of a close evolutionary relationship rather than an expression of structural similarity. Practical implications of the present analysis for protein structure prediction and engineering are also described.
Read full abstract