Abstract

Patterns of alternation of hydrophobic and polar residues are a profound aspect of amino acid sequences, but a feature not easily interpreted for soluble proteins. Here we report statistics of hydrophobicity patterns in proteins of known structure in a current protein database as compared with results from earlier, more limited structure sets. Previous studies indicated that long hydrophobic runs, common in membrane proteins, are underrepresented in soluble proteins. Long runs of hydrophobic residues remain significantly underrepresented in soluble proteins, with none longer than 16 residues observed. These long runs most commonly occur as buried alpha helices, with extended hydrophobic strands less common. Avoiding aggregation of partially folded intermediates during intracellular folding remains a viable explanation for the rarity of long hydrophobic runs in soluble proteins. Comparison between database editions reveals robustness of statistics on aqueous proteins despite an approximately twofold increase in nonredundant sequences. The expanded database does now allow us to explain several deviations of hydrophobicity statistics from models of random sequence in terms of requirements of specific secondary structure elements. Comparison to prior membrane-bound protein sequences, however, shows significant qualitative changes, with the average hydrophobicity and frequency of long runs of hydrophobic residues noticeably increasing between the database editions. These results suggest that the aqueous proteins of solved structure may represent an essentially complete sample of the universe of aqueous sequences, while the membrane proteins of known structure are not yet representative of the universe of membrane-associated proteins, even by relatively simple measures of hydrophobic patterns.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call