The protein folding problem has been studied in the field of molecular biophysics and biochemistry for many years. Even small changes in folding patterns may lead to serious diseases such as Alzheimer's or Parkinson's where proteins are folded either too quickly or too slowly. Molecular dynamics (MD) is one of the tools used to understand how proteins fold into native conformations. While it captures sequences of conformations that lead over time to the folded state, limitations in simulation timescales remain problematic. Although many approaches have been suggested to speed up the simulation process using rapid changes in temperature or pressure, we propose a rational approach, Greedy-proximal A* (GPA*), derived from path finding algorithms to explore the supposed shortest path folding pathway from the unfolded to a given folded conformation. We introduce several new protein structure comparison metrics based on the contact map distance to help mitigate the challenges faced by "standard" metrics. We test our approach on proteins which represent the two main types of secondary structure: (a) the Trp-cage miniprotein construct TC5b (1L2Y) which is a short, fast-folding protein that represents an α-helical secondary structure formed because of a locked tryptophan in the middle, (b) the immunoglobulin binding domain of the streptococcal protein G (1GB1), containing an α-helix and several β-sheets, and (c) the chicken villin subdomain HP-35, N68H protein (1YRF)-one of the fastest folding proteins which forms three α-helices. We compare our algorithm to replica-exchange MD and steered MD methods which represent the main algorithms used for accelerating folding proteins with MD. We find that GPA* not only reduces the computational time needed to obtain the folded conformation without adding artificial energy bias but also makes it possible to generate trajectories which contain minimal motions needed for the folding transition.
Read full abstract