The hydrophobic interaction is the main driving force for protein folding. Here, we address the question of what is the optimal fraction, f of hydrophobic (H) residues required to ensure protein collapse. For very small f (say f<0.1), the protein chain is expected to behave as a random coil, where the H residues are "wrapped" locally by polar (P) residues. However, for large enough f this local coverage cannot be achieved and the thermodynamic alternative to avoid contact with water is burying the H residues in the interior of a compact chain structure. The interior also contains P residues that are known to be clustered to optimize their electrostatic interactions. This means that the H residues are clustered as well, i.e. they effectively attract each other like the H-monomers in Dill's HP lattice model. Previously, we asked the question: assuming that the H monomers in the HP model are distributed randomly along the chain, what fraction of them is required to ensure a compact ground state? We claimed there that f approximately p(c), where p(c) is the site percolation threshold of the lattice (in a percolation experiment, each site of an initially empty lattice is visited and a particle is placed there with a probability p. The interest is in the critical (minimal) value, p(c), for which percolation occurs, i.e. a cluster connecting the opposite sides of the lattice is created). Due to the above correspondence between the HP model and real proteins (and assuming that the H residues are distributed at random) we suggest that the experimental f should lead to percolating clusters of H residues over the highly dense protein core, i.e. clusters of the core size. To check this theory, we treat a simplified model consisting of H and P residues represented by their alpha-carbon atoms only. The structure is defined by the C(alpha)-C(alpha) virtual bond lengths, angles and dihedral angles, and the X-ray structure is best-fitted onto a face-centered cubic lattice. Percolation experiments are carried out for 103 single-chain proteins using six different hydrophobic sets of residues. Indeed, on average, percolating clusters are generated, which supports our theory; however, some sets lead to a better core coverage than others. We also calculate the largest actual hydrophobic cluster of each protein and show that, on average, these clusters span the core, again in accord with our theory. We discuss the effect of protein size, deviations from the average picture, and implications of this study for defining reliable simplified models of proteins.
Read full abstract