An Evolution-Based Approach to De Novo Protein Design and Case Study on Mycobacterium tuberculosis

Pralay Mitra,David Shultis,David Marsh,Jeffrey R Brender,Jeff Czajka,Yang Zhang,Tomasz Cierpicki,Felicia Gray,Eugene I Shakhnovich

doi:10.1371/journal.pcbi.1003298

Pralay Mitra, David Shultis + Show 7 more

Open Access

https://doi.org/10.1371/journal.pcbi.1003298

Copy DOI

Abstract

Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality.

Highlights

Computational protein design aims to identify new amino acid sequences that have desirable 3-dimensional (3D) structure and biological function
Most protein design methods are developed to search for sequences with the lowest free-energy based on physics-based force fields following Anfinsen’s thermodynamic hypothesis
Since sequence profiles are generally more accurate than physics-based potentials in protein fold recognition, a unique advantage lies on that it targets the design procedure to a family of protein sequence profiles to enhance the robustness of designed sequences

Summary

Introduction

Computational protein design aims to identify new amino acid sequences that have desirable 3-dimensional (3D) structure and biological function. Zhang et al [12] and Mirjalili et al [13] further showed that the spatial restraints from structural templates can help improve the energy funnel of the physics-based force field and guide the molecular dynamics simulation for structure refinements. These refinements are limited to fine-tuning the local structure details and are far from topology-level improvements

Methods

Results

Discussion

Conclusion