Sequence-function Relationships Research Articles

Cellulases from glycoside hydrolase family 5 (GH5) are key endoglucanase enzymes in the degradation of diverse polysaccharide substrates and are used in industrial enzyme cocktails to break down biomass. The GH5 family shares a canonical (βα)8-barrel structure, where each (βα) module is essential for the enzyme's stability and activity. Despite their shared topology, the thermostability of GH5 endoglucanase enzymes can vary significantly, and highly thermostable variants are often sought for industrial applications. Based on the previously characterized thermophilic GH5 endoglucanase Egl5A from Talaromyces emersonii (TeEgl5A), which has an optimal temperature of 90°C, we created 10 hybrid enzymes with elements of the mesophilic endoglucanase Cel5 from Stegonsporium opalus (SoCel5) to determine which elements are responsible for enhanced thermostability. Five of the expressed hybrid enzymes exhibit enzyme activity. Two of these hybrids exhibited pronounced increases in the temperature optimum (10 and 20°C), the temperature at which the protein lost 50% of its activity (T50) (15 and 19°C), and the melting temperature (Tm ) (16.5 and 22.9°C) and extended half-lives (t1/2) (∼240- and 650-fold at 55°C) relative to the values for the mesophilic parent enzyme and demonstrated improved catalytic efficiency on selected substrates. The successful hybridization strategies were validated experimentally in another GH5 endoglucanase, Cel5 from Aspergillus niger (AnCel5), which demonstrated a similar increase in thermostability. Based on molecular dynamics (MD) simulations of both the SoCel5 and TeEgl5A parent enzymes and their hybrids, we hypothesize that improved hydrophobic packing of the interface between α2 and α3 is the primary mechanism by which the hybrid enzymes increase their thermostability relative to that of the mesophilic parent SoCel5.IMPORTANCE Thermal stability is an essential property of enzymes in many industrial biotechnological applications, as high temperatures improve bioreactor throughput. Many protein engineering approaches, such as rational design and directed evolution, have been employed to improve the thermal properties of mesophilic enzymes. Structure-based recombination has also been used to fuse TIM barrel fragments, and even fragments from unrelated folds, to generate new structures. However, little research has been done on GH5 endoglucanases. In this study, two GH5 endoglucanases exhibiting TIM barrel structure, SoCel5 and TeEgl5A, with different thermal properties, were hybridized to study the roles of different (βα) motifs. This work illustrates the role that structure-guided recombination can play in helping to identify sequence function relationships within GH5 enzymes by supplementing natural diversity with synthetic diversity.

Read full abstract

BackgroundDeposition of new genetic sequences in online databases is expanding at an unprecedented rate. As a result, sequence identification continues to outpace functional characterization of carbohydrate active enzymes (CAZymes). In this paradigm, the discovery of enzymes with novel functions is often hindered by high volumes of uncharacterized sequences particularly when the enzyme sequence belongs to a family that exhibits diverse functional specificities (i.e., polyspecificity). Therefore, to direct sequence-based discovery and characterization of new enzyme activities we have developed an automated in silico pipeline entitled: Sequence Analysis and Clustering of CarboHydrate Active enzymes for Rapid Informed prediction of Specificity (SACCHARIS). This pipeline streamlines the selection of uncharacterized sequences for discovery of new CAZyme or CBM specificity from families currently maintained on the CAZy website or within user-defined datasets.ResultsSACCHARIS was used to generate a phylogenetic tree of a GH43, a CAZyme family with defined subfamily designations. This analysis confirmed that large datasets can be organized into sequence clusters of manageable sizes that possess related functions. Seeding this tree with a GH43 sequence from Bacteroides dorei DSM 17855 (BdGH43b, revealed it partitioned as a single sequence within the tree. This pattern was consistent with it possessing a unique enzyme activity for GH43 as BdGH43b is the first described α-glucanase described for this family. The capacity of SACCHARIS to extract and cluster characterized carbohydrate binding module sequences was demonstrated using family 6 CBMs (i.e., CBM6s). This CBM family displays a polyspecific ligand binding profile and contains many structurally determined members. Using SACCHARIS to identify a cluster of divergent sequences, a CBM6 sequence from a unique clade was demonstrated to bind yeast mannan, which represents the first description of an α-mannan binding CBM. Additionally, we have performed a CAZome analysis of an in-house sequenced bacterial genome and a comparative analysis of B. thetaiotaomicron VPI-5482 and B. thetaiotaomicron 7330, to demonstrate that SACCHARIS can generate “CAZome fingerprints”, which differentiate between the saccharolytic potential of two related strains in silico.ConclusionsEstablishing sequence-function and sequence-structure relationships in polyspecific CAZyme families are promising approaches for streamlining enzyme discovery. SACCHARIS facilitates this process by embedding CAZyme and CBM family trees generated from biochemically to structurally characterized sequences, with protein sequences that have unknown functions. In addition, these trees can be integrated with user-defined datasets (e.g., genomics, metagenomics, and transcriptomics) to inform experimental characterization of new CAZymes or CBMs not currently curated, and for researchers to compare differential sequence patterns between entire CAZomes. In this light, SACCHARIS provides an in silico tool that can be tailored for enzyme bioprospecting in datasets of increasing complexity and for diverse applications in glycobiotechnology.

Read full abstract

Sequence-function Relationships Research Articles

Articles published on Sequence-function Relationships

Leveraging implicit knowledge in neural networks for functional dissection and engineering of proteins

Activity and Thermostability of GH5 Endoglucanase Chimeras from Mesophilic and Thermophilic Parents.

Applications of high-throughput sequencing to analyze and engineer ribozymes.

Improved mutant function prediction via PACT: Protein Analysis and Classifier Toolkit

DeepT3: deep convolutional neural networks accurately identify Gram-negative bacterial type III secreted effectors using the N-terminal sequence.

High activity chimeric snake gamma-type phospholipase A2 inhibitor created by DNA shuffling

Codeinone reductase isoforms with differential stability, efficiency and product selectivity in opium poppy.

Fungal-type carbohydrate binding modules from the coccolithophore Emiliania huxleyi show binding affinity to cellulose and chitin.

Affinity maturation of an TpoR targeting antibody in full-length IgG form for enhanced agonist activity.

SACCHARIS: an automated pipeline to streamline discovery of carbohydrate active enzyme activities within polyspecific families and de novo sequence datasets

Control of Cell-Selective Activity of Membrane-Active Polyleucine-Based Peptides using Database-Guided High-throughput Screening

Multiplexed gene synthesis in emulsions for exploring protein functional landscapes.

Characterizing Protein-Protein Interactions Using Deep Sequencing Coupled to Yeast Surface Display.

Coevolutionary Landscape of Kinase Family Proteins: Sequence Probabilities and Functional Motifs

Insights into substrate binding of ferulic acid esterases by arabinose and methyl hydroxycinnamate esters and molecular docking

Pichia pastoris Alcohol Oxidase 1 (AOX1) Core Promoter Engineering by High Resolution Systematic Mutagenesis.

Directed Evolution of Proteins Based on Mutational Scanning.

Protein Science by DNA Sequencing: How Advances in Molecular Biology Are Accelerating Biochemistry.

Systematic optimization of L-tryptophan riboswitches for efficient monitoring of the metabolite in Escherichia coli.

Sort-Seq Approach to Engineering a Formaldehyde-Inducible Promoter for Dynamically Regulated Escherichia coli Growth on Methanol.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Sequence-function Relationships Research Articles

Articles published on Sequence-function Relationships

Leveraging implicit knowledge in neural networks for functional dissection and engineering of proteins

Activity and Thermostability of GH5 Endoglucanase Chimeras from Mesophilic and Thermophilic Parents.

Applications of high-throughput sequencing to analyze and engineer ribozymes.

Improved mutant function prediction via PACT: Protein Analysis and Classifier Toolkit

DeepT3: deep convolutional neural networks accurately identify Gram-negative bacterial type III secreted effectors using the N-terminal sequence.

High activity chimeric snake gamma-type phospholipase A2 inhibitor created by DNA shuffling

Codeinone reductase isoforms with differential stability, efficiency and product selectivity in opium poppy.

Fungal-type carbohydrate binding modules from the coccolithophore Emiliania huxleyi show binding affinity to cellulose and chitin.

Affinity maturation of an TpoR targeting antibody in full-length IgG form for enhanced agonist activity.

SACCHARIS: an automated pipeline to streamline discovery of carbohydrate active enzyme activities within polyspecific families and de novo sequence datasets

Control of Cell-Selective Activity of Membrane-Active Polyleucine-Based Peptides using Database-Guided High-throughput Screening

Multiplexed gene synthesis in emulsions for exploring protein functional landscapes.

Characterizing Protein-Protein Interactions Using Deep Sequencing Coupled to Yeast Surface Display.

Coevolutionary Landscape of Kinase Family Proteins: Sequence Probabilities and Functional Motifs

Insights into substrate binding of ferulic acid esterases by arabinose and methyl hydroxycinnamate esters and molecular docking

Pichia pastoris Alcohol Oxidase 1 (AOX1) Core Promoter Engineering by High Resolution Systematic Mutagenesis.

Directed Evolution of Proteins Based on Mutational Scanning.

Protein Science by DNA Sequencing: How Advances in Molecular Biology Are Accelerating Biochemistry.

Systematic optimization of L-tryptophan riboswitches for efficient monitoring of the metabolite in Escherichia coli.

Sort-Seq Approach to Engineering a Formaldehyde-Inducible Promoter for Dynamically Regulated Escherichia coli Growth on Methanol.