Protein folding prediction

Binguang Ma

doi:10.1360/n972016-00658

Abstract

Protein folding is the process that a protein molecule transforms from the linear polymer of peptides to a three-dimensional native structure with specific biological function. By now, the protein folding problem has been studied for more than 50 years and already became a broad and active research field. To answer the 58th question raised by Science in 2005, in this article we briefly reviewed the background and research history of the protein folding problem, and introduced the progresses of protein folding prediction research from four aspects: the protein folding process prediction (protein folding simulation), the folding process related parameter prediction, the protein folding result prediction (protein structure prediction), and the folding result related parameter prediction. The studies on the protein folding problem began in the 60s of 20th century, with the efforts to seek a solution to the paradox that a protein can actually form a native 3D structure in only several seconds but the time scale estimated by a thermodynamic ergodic hypothesis would be longer than the age of universe. Computer simulation is an important approach for protein folding study. The protein models can be classified into 3 categories: lattice model, off-lattice model and all-atom model. The current knowledge about protein folding mechanism is based on the concept of folding funnel on a free-energy landscape, and the current opinion is that the protein folding mechanism is not unique for the whole protein universe and that there may exist a continuum between the two extreme ends of hierarchical folding and nucleation folding scenarios. The hardware for protein folding simulation was becoming more powerful; distributed systems (e.g, Folding@home), special-purpose machines (e.g, ANTON), and GPU-based platforms have been developed for protein folding simulation. Meanwhile, the folding simulation software was continuously enhanced. An important issue in protein folding simulation is to overcome the local energy barrier to find the global energy minimum; several approaches such as replica-exchange, multi-scale modeling and Modeling Employing Limited Data (MELD) were developed to tackle this issue; human intelligence involvement (e.g, “Foldit” Game) is another interesting effort. During the past two decades, the ability of protein folding simulation was continuously rising. For now, the folding simulation for the proteins with dozens of amino acids can reach a time scale of millisecond, while the protein size able to do effective folding simulation is around 100 amino acids. The targets of protein folding simulation have been largely expanded and now include both the in vitro and the in vivo folding such as co-translational folding, chaperone-assistant folding, small-molecule- induced folding and metal-coupled folding. Folding rate and folding type are two important parameters related with the protein folding process and now they can be predicted by statistical and machine-learning approaches based on different levels of structural features such as the topological properties of tertiary structure, the contents of secondary structure and the amino acid frequencies of primary structure. The result of a protein folding process is the formation of a protein structure. According to the hierarchy of structural organization, the protein structure prediction problem includes secondary structure prediction, tertiary structure prediction and quaternary structure prediction. By now, the secondary structure prediction algorithm has experienced five generations and the current accuracy is about 80% for 3-classes prediction. The tertiary structure prediction approaches mainly include two categories: template-based modeling and free modeling, with the former having higher accuracy and the latter having larger application scope. The quaternary structure prediction includes the prediction of complex structure and the prediction of the possibility of protein-protein interaction, and these predictions can be performed based on protein 3D structure or merely amino acid sequence. Structure related parameter prediction also attracted research interests, including the predictions of protein structural classes, secondary structure contents, disordered regions, solvent accessible surface region and the amino acid contacting pairs in the interface of protein-protein interaction. In the end, some possible development directions worth noticing in the future of protein folding research were suggested and they are: the coupling between protein folding and binding, the fusion of protein folding research with systems biology and the application of deep-learning techniques in the field of protein folding prediction.

Full Text