Since the pioneering work of Plaxco, Simons, and Baker, it is now well known that the rates of protein folding strongly correlate with the average sequence separation (absolute contact order (ACO)) of native contacts. In spite of multitude of papers, our understanding to the basis of the relation between folding speed and ACO is still lacking. We model the transition state as a gaussian polymer chain decorated with weak springs between native contacts while the unfolded state is modeled as a gaussian chain only. Using these hamiltonians, our perturbative calculation explicitly shows folding speed and ACO are linearly related when only the first order term in the series is considered. However, to the second order, we notice the existence of two new topological metrics, termed COC(1) and COC(2) (COC stands for contact order correction). These additional correction terms are needed to properly account for the entropy loss due to overlapping (nested or linked) loops that are not well described by simple addition of entropies in ACO. COC(1) and COC(2) are related to fluctuations and correlations among different sequence separations. The new metric combining ACO, COC(1), and COC(2) improves folding speed dependence on native topology when applied to three different databases: (i) two-state proteins with only α∕β and β proteins, (ii) two-state proteins (α∕β, β and purely helical proteins all combined), and (iii) master set (multi-state and two-state) folding proteins. Furthermore, the first principle calculation provides us direct physical insights to the meaning of the fit parameters. The coefficient of ACO, for example, is related to the average strength of the contacts, while the constant term is related to the protein folding speed limit. With the new scaling law, our estimate of the folding speed limit is in close agreement with the widely accepted value of 1 μs observed in proteins and RNA. Analyzing an exhaustive set (7367) of monomeric proteins from protein data bank, we find our new topology based metric (combining ACO, COC(1), and COC(2)) scales as N(0.54), N being the number of amino acids in a protein. This is in remarkable agreement with a previous argument based on random systems that predict protein folding speed depends on exp (-N(0.5)). The first principle calculation presented here provides deeper insights to the role of topology in protein folding and unifies many parallel arguments, seemingly disconnected, demonstrating the existence of universal mechanism in protein folding kinetics that can be understood from simple polymer physics based principles.
Read full abstract