Abstract
Circular permutation (CP) refers to situations in which the termini of a protein are relocated to other positions in the structure. CP occurs naturally and has been artificially created to study protein function, stability and folding. Recently CP is increasingly applied to engineer enzyme structure and function, and to create bifunctional fusion proteins unachievable by tandem fusion. CP is a complicated and expensive technique. An intrinsic difficulty in its application lies in the fact that not every position in a protein is amenable for creating a viable permutant. To examine the preferences of CP and develop CP viability prediction methods, we carried out comprehensive analyses of the sequence, structural, and dynamical properties of known CP sites using a variety of statistics and simulation methods, such as the bootstrap aggregating, permutation test and molecular dynamics simulations. CP particularly favors Gly, Pro, Asp and Asn. Positions preferred by CP lie within coils, loops, turns, and at residues that are exposed to solvent, weakly hydrogen-bonded, environmentally unpacked, or flexible. Disfavored positions include Cys, bulky hydrophobic residues, and residues located within helices or near the protein's core. These results fostered the development of an effective viable CP site prediction system, which combined four machine learning methods, e.g., artificial neural networks, the support vector machine, a random forest, and a hierarchical feature integration procedure developed in this work. As assessed by using the hydrofolate reductase dataset as the independent evaluation dataset, this prediction system achieved an AUC of 0.9. Large-scale predictions have been performed for nine thousand representative protein structures; several new potential applications of CP were thus identified. Many unreported preferences of CP are revealed in this study. The developed system is the best CP viability prediction method currently available. This work will facilitate the application of CP in research and biotechnology.
Highlights
Circular permutation of a protein is a structural rearrangement whereby the N- and C-termini of structural homologs are located at different positions
Most naturally occurring Circular permutation (CP) cases are the result of complex genetic events, such as those mentioned in Introduction or summarized in [33]
Using the Ramachandran codes to describe protein backbone conformations, we found that CP favored the codes that corresponded to the Ramachandran plot regions with high populations of isolated b-strands, random coils, turns, and Pro residues, the last of which was consistent with our sequence-based results
Summary
Circular permutation of a protein is a structural rearrangement whereby the N- and C-termini of structural homologs are located at different positions. As long as a CP site, i.e., the position for creating the new termini, is not a residue essential for protein folding or function, the artificial CPM generally has native function(s) [4,15,16], its folding pathways and/or the structural stability might be changed [17,18,19]. Owning to these discoveries, CP has become a new method beyond traditional mutagenesis for studying proteins [20,21,22]. CP allows the covalent linkage of two proteins at positions other than their native termini and has made possible the creation of several useful protein switches, molecular biosensors, and novel fusion proteins [26,27,28]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.