Abstract

BackgroundThe ability to design thermostable proteins is theoretically important and practically useful. Robust and accurate algorithms, however, remain elusive. One critical problem is the lack of reliable methods to estimate the relative thermostability of possible mutants.ResultsWe report a novel scoring function for discriminating hyperthermophilic and mesophilic proteins with application to predicting the relative thermostability of protein mutants. The scoring function was developed based on an elaborate analysis of a set of features calculated or predicted from 540 pairs of hyperthermophilic and mesophilic protein ortholog sequences. It was constructed by a linear combination of ten important features identified by a feature ranking procedure based on the random forest classification algorithm. The weights of these features in the scoring function were fitted by a hill-climbing algorithm. This scoring function has shown an excellent ability to discriminate hyperthermophilic from mesophilic sequences. The prediction accuracies reached 98.9% and 97.3% in discriminating orthologous pairs in training and the holdout testing datasets, respectively. Moreover, the scoring function can distinguish non-homologous sequences with an accuracy of 88.4%. Additional blind tests using two datasets of experimentally investigated mutations demonstrated that the scoring function can be used to predict the relative thermostability of proteins and their mutants at very high accuracies (92.9% and 94.4%). We also developed an amino acid substitution preference matrix between mesophilic and hyperthermophilic proteins, which may be useful in designing more thermostable proteins.ConclusionsWe have presented a novel scoring function which can distinguish not only HP/MP ortholog pairs, but also non-homologous pairs at high accuracies. Most importantly, it can be used to accurately predict the relative stability of proteins and their mutants, as demonstrated in two blind tests. In addition, the residue substitution preference matrix assembled in this study may reflect the thermal adaptation induced substitution biases. A web server implementing the scoring function and the dataset used in this study are freely available at http://www.abl.ku.edu/thermorank/.

Highlights

  • The ability to design thermostable proteins is theoretically important and practically useful

  • Results and Discussion we first report a mesophilic protein (MP)/hyperthermophilic protein (HP) residue substitution preference matrix generated from the BLAST pairwise alignments of MP and HP orthologs

  • Amino acid composition The overall differences in amino acid composition between HPs and MPs are consistent with previous reports (Table 5) [7,11,22,23,24,45]

Read more

Summary

Introduction

The ability to design thermostable proteins is theoretically important and practically useful. Computational protein design methods have been attracted much attention due to their potential cost and time savings over conventional directed evolution approaches [3,5,6] These types of approaches utilize information extracted from protein sequences and/or 3D structures to predict favorable mutations that may enhance protein thermostability. A key step in such approaches is the development of reliable methods for estimating the relative stability of possible mutants to identify favorable mutations. Such methods may help better understand the proteinfolding problem since the ultimate outcome of protein folding is a native structure with the lowest free energy among many possible structures of a protein. Gromiha and Suresh applied 12 different classification algorithms and the best accuracy achieved reached 89% [21]

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.