Abstract

Enzymes that catalyze chemical reactions at high temperatures are used for industrial biocatalysis, applications in molecular biology, and as highly evolvable starting points for protein engineering. The optimal growth temperature (OGT) of organisms is commonly used to estimate the stability of enzymes encoded in their genomes, but the number of experimentally determined OGT values are limited, particularly for thermophilic organisms. Here, we report on the development of a machine learning model that can accurately predict OGT for bacteria, archaea, and microbial eukaryotes directly from their proteome-wide 2-mer amino acid composition. The trained model is made freely available for reuse. In a subsequent step we use OGT data in combination with amino acid composition of individual enzymes to develop a second machine learning model-for prediction of enzyme catalytic temperature optima ( Topt). The resulting model generates enzyme Topt estimates that are far superior to using OGT alone. Finally, we predict Topt for 6.5 million enzymes, covering 4447 enzyme classes, and make the resulting data set available to researchers. This work enables simple and rapid identification of enzymes that are potentially functional at extreme temperatures.

Highlights

  • Enzymes that remain active at high temperatures, sometimes referred to as thermozymes, are used to catalyze chemical reactions in industrial processes (1 –8 ), for applications in molecular biology (9 –14 ), and for providing highly evolvable starting points for protein engineering (15 –19 )

  • When testing new enzymes for these applications the optimal growth temperature (OGT) of microorganisms is commonly used to estimate protein stability – enzymes derived from thermophilic organisms are expected to be both stable and active at high temperatures

  • Protein amino acid composition is strongly correlated with OGT(30, 32 )

Read more

Summary

Introduction

Enzymes that remain active at high temperatures, sometimes referred to as thermozymes, are used to catalyze chemical reactions in industrial processes (1 –8 ), for applications in molecular biology (9 –14 ), and for providing highly evolvable starting points for protein engineering (15 –19 ). In practice this means that enzymes from thermophilic organisms may be optimally active at significantly lower temperatures than expected Due to these challenges a simple way to computationally estimate (1) the OGT of microbes and (2) the Topt of enzymes is in demand. Computational prediction of protein stability is challenging since it usually needs an accurate calculation of Gibbs-free energy change of protein unfolding process(41 , 42 ), which relies mainly on high-quality protein structures Such structures are limited in number, thereby reducing the applicability of these methods for identifying thermostable enzymes for industrial applications. We build a machine learning model to accurately predict OGT using features extracted from all proteins encoded by an organism’s genome This model is used to assign OGT values for organisms without experimental data. The OGT model and enzyme Topt estimates are made freely available for reuse (https://github.com/EngqvistLab/Tome)

Software
Proteome dataset
Estimation of threshold
Machine learning workflow for OGT model
OGT Model validation
Machine learning workflow for Topt model
BRENDA annotation
Collection of optimal growth temperature and proteomes of microorganisms
OGT can be accurately predicted from amino acid composition of the proteome
Validation of the SVR model for growth temperature prediction
Improved estimation of enzyme temperature optima using machine learning
Annotating enzymes in BRENDA using OGT and predicted
Author contributions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.