Computational Bioinformatics Research Articles

Synthetic biology is a fast-evolving research field that combines biology and engineering principles to develop new biological systems for medical, pharmacological, and industrial applications. Synthetic biologists use iterative "design, build, test, and learn" cycles to efficiently engineer genetic systems that are reliable, reproducible, and predictable. Protein engineering by directed evolution can benefit from such a systematic engineering approach for various reasons. Learning can be carried out before starting, throughout or after finalizing a directed evolution project. Computational tools, bioinformatics, and scanning mutagenesis methods can be excellent starting points, while molecular dynamics simulations and other strategies can guide engineering efforts. Similarly, studying protein intermediates along evolutionary pathways offers fascinating insights into the molecular mechanisms shaped by evolution. The learning step of the cycle is not only crucial for proteins or enzymes that are not suitable for high-throughput screening or selection systems, but it is also valuable for any platform that can generate a large amount of data that can be aided by machine learning algorithms. The main challenge in protein engineering is to predict the effect of a single mutation on one functional parameter-to say nothing of several mutations on multiple parameters. This is largely due to nonadditive mutational interactions, known as epistatic effects-beneficial mutations present in a genetic background may not be beneficial in another genetic background. In this work, we provide an overview of experimental and computational strategies that can guide the user to learn protein function at different stages in a directed evolution project. We also discuss how epistatic effects can influence the success of directed evolution projects. Since machine learning is gaining momentum in protein engineering and the field is becoming more interdisciplinary thanks to collaboration between mathematicians, computational scientists, engineers, molecular biologists, and chemists, we provide a general workflow that familiarizes nonexperts with the basic concepts, dataset requirements, learning approaches, model capabilities and performance metrics of this intriguing area. Finally, we also provide some practical recommendations on how machine learning can harness epistatic effects for engineering proteins in an "outside-the-box" way.

Read full abstract

For decades, structural analysis of proteins have received considerable attention, from their sequencing to the determination of their 3D structures either in the free state (e.g., no host-guest system, apoproteins) or (non)covalently bound complexes. The elucidation of the 3D structures and the mapping of intra- and intermolecular interactions are valuable sources of information to understand the physicochemical properties of such systems. X-ray crystallography and nuclear magnetic resonance are methods of choice for obtaining structures at the atomic level. Nonetheless, they still present drawbacks which limit their use to highly purified systems in a relatively high amount. On the contrary, mass spectrometry (MS) has become a powerful tool thanks to its selectivity, sensitivity, and the development of structural methods both at the global shape and the residue level. The combination of several MS-based methods is mandatory to fully assign a putative structure in combination with computational chemistry and bioinformatics. In that context, we propose a strategy which complements the existing methods of structural studies (e.g., circular dichroism, hydrogen/deuterium exchange and cross-links experiments, nuclear magnetic resonance). The workflow is based on the collection of structural information on proteins from the apparition rates and the time of appearance of released peptides generated by a protease in controlled experimental conditions with online detection by electrospray high-resolution mass spectrometry. Nondenaturing, partially or fully denatured proteins were digested by the enzymatic reactor, i.e., β-lactoglobulin, cytochrome c, and β-casein. The collected data are interpreted with regard to the kinetic schemes with time-dependent rates of the enzymatic digestion established beforehand, considering kinetics parameters in the Michaelis-Menten formalism including kcat (the turnover number), k1 (formation of the enzyme-substrate complex), k-1 (dissociation of the enzyme-substrate complex), koff (local refolding of the protein around the cleavage site), and kon (local unfolding of the protein around the cleavage site). Solvent-accessible surface analysis through digestion kinetics was also investigated. The initial apparition rates of released peptides varied according to the protein state (folded vs denatured) and informs the koff/kon ratio around the cleavage site. On the other hand, the time of appearance of a given peptide is related to its solvent accessibility and to the resilience of the residual protein structure in solution. Temperature-dependent digestion experiments allowed estimation of the type of secondary structures around the cleavage site.

Read full abstract

Computational Bioinformatics Research Articles

Related Topics

Articles published on Computational Bioinformatics

Teaching computational genomics and bioinformatics on a high performance computing cluster-a primer.

Multi-omics integration and interactomics reveals molecular networks and regulators of the beneficial effect of yoga and exercise.

A summary of feature selection techniques for gene chip in bioinformatics based on RSA algorithm

MOVIS: A multi-omics software solution for multi-modal time-series clustering, embedding, and visualizing tasks

Efficient subhypergraph matching based on hyperedge features

Learning Strategies in Protein Directed Evolution.

Label-Free Higher Order Structure and Dynamic Investigation Method of Proteins in Solution Using an Enzymatic Reactor Coupled to Electrospray High-Resolution Mass Spectrometry Detection.

Núcleo de Pesquisa em Bioinformática da Universidade de Caxias do Sul: 12 anos de história

Identification of Possible Inhibitor Molecule against NS5 MTase and RdRp Protein of Dengue Virus in Saudi Arabia

Editorial: From the New Editor-in-Chief

Et Özgüllüğünün Belirlenmesinde Primer Setlerinin Tasarımına Yönelik Biyoinformatik Tabanlı Bir Yaklaşım

Application of Computational Methods in Understanding Mutations in Mycobacterium tuberculosis Drug Resistance.

An Anecdote on Prospective Protein Targets for Developing Novel Plant Growth Regulators.

GALAXY Workflow for Bacterial Next‐Generation Sequencing De Novo Assembly and Annotation

How the Biochemical Society and Portland Press are engaging with and supporting early career researchers

Genetic variants of APEX1 p.Asp148Glu and XRCC1 p.Gln399Arg with the susceptibility of hepatocellular carcinoma.

Whole-Genome Differentially Hydroxymethylated DNA Regions among Twins Discordant for Cardiovascular Death.

Optimized SQE atomic charges for peptides accessible via a web application

Learning Deep Attention Network from Incremental and Decremental Features for Evolving Features

Progress and Challenge in Computational Identification of Influenza Virus Reassortment.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Computational Bioinformatics Research Articles

Related Topics

Articles published on Computational Bioinformatics

Teaching computational genomics and bioinformatics on a high performance computing cluster-a primer.

Multi-omics integration and interactomics reveals molecular networks and regulators of the beneficial effect of yoga and exercise.

A summary of feature selection techniques for gene chip in bioinformatics based on RSA algorithm

MOVIS: A multi-omics software solution for multi-modal time-series clustering, embedding, and visualizing tasks

Efficient subhypergraph matching based on hyperedge features

Learning Strategies in Protein Directed Evolution.

Label-Free Higher Order Structure and Dynamic Investigation Method of Proteins in Solution Using an Enzymatic Reactor Coupled to Electrospray High-Resolution Mass Spectrometry Detection.

Núcleo de Pesquisa em Bioinformática da Universidade de Caxias do Sul: 12 anos de história

Identification of Possible Inhibitor Molecule against NS5 MTase and RdRp Protein of Dengue Virus in Saudi Arabia

Editorial: From the New Editor-in-Chief

Et Özgüllüğünün Belirlenmesinde Primer Setlerinin Tasarımına Yönelik Biyoinformatik Tabanlı Bir Yaklaşım

Application of Computational Methods in Understanding Mutations in Mycobacterium tuberculosis Drug Resistance.

An Anecdote on Prospective Protein Targets for Developing Novel Plant Growth Regulators.

GALAXY Workflow for Bacterial Next‐Generation Sequencing De Novo Assembly and Annotation

How the Biochemical Society and Portland Press are engaging with and supporting early career researchers

Genetic variants of APEX1 p.Asp148Glu and XRCC1 p.Gln399Arg with the susceptibility of hepatocellular carcinoma.

Whole-Genome Differentially Hydroxymethylated DNA Regions among Twins Discordant for Cardiovascular Death.

Optimized SQE atomic charges for peptides accessible via a web application

Learning Deep Attention Network from Incremental and Decremental Features for Evolving Features

Progress and Challenge in Computational Identification of Influenza Virus Reassortment.