Abstract
The arrangement of amino acids in a protein sequence encodes its native folding. However, the same arrangement in aggregation-prone regions may cause misfolding as a result of local environmental stress. Under normal physiological conditions, such regions congregate in the protein’s interior to avoid aggregation and attain the native fold. We have used solvent accessibility of aggregation patches (SAAPp) to determine the packing of aggregation-prone residues. Our results showed that SAAPp has low values for native crystal structures, consistent with protein folding as a mechanism to minimize the solvent accessibility of aggregation-prone residues. SAAPp also shows an average correlation of 0.76 with the global distance test (GDT) score on CASP12 template-based protein models. Using SAAPp scores and five structural features, a random forest machine learning quality assessment tool, SAAP-QA, showed 2.32 average GDT loss between best model predicted and actual best based on GDT score on independent CASP test data, with the ability to discriminate native-like folds having an AUC of 0.94. Overall, the Pearson correlation coefficient (PCC) between true and predicted GDT scores on independent CASP data was 0.86 while on the external CAMEO dataset, comprising high quality protein structures, PCC and average GDT loss were 0.71 and 4.46 respectively. SAAP-QA can be used to detect the quality of models and iteratively improve them to native or near-native structures.
Highlights
The folding of a protein is a self-assembly process where the information of three dimensional (3D) structure is cryptically encoded in the primary sequence[1]
We developed a hypothesis on solvent accessibility of aggregation patch (SAAP), based on the first CASP12 target[20], which
Our hypothesis that SAAPp is a measure of protein folding was tested on CASP12 template based model (TBM) predictions
Summary
The folding of a protein is a self-assembly process where the information of three dimensional (3D) structure is cryptically encoded in the primary sequence[1]. Clustering of hydrophobic groups in a polar solvent is an “entropy-driven” process, which leads to the collapse of side chains to functional native conformations. This “hydrophobic collapse” is considered as the most popular protein folding model[10,11,12]. In contrast to protein folding which leads to the native state, protein misfolding is a self-assembly process that results in an aggregated form. Under physiological conditions, these patches self-assemble in the core of globular structures[8,14] ruling out misfolding or aggregation and leading to the native structures. The scoring function developed using SAAPp, SAAP-QA, showed excellent results comparable with the state-of-art methods in this field
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.