Abstract

The success of protein evolution campaigns is strongly dependent on the sequence context in which mutations are introduced, stemming from pervasive non-additive interactions between a protein’s amino acids (‘intra-gene epistasis’). Our limited understanding of such epistasis hinders the correct prediction of the functional contributions and adaptive potential of mutations. Here we present a straightforward unique molecular identifier (UMI)-linked consensus sequencing workflow (UMIC-seq) that simplifies mapping of evolutionary trajectories based on full-length sequences. Attaching UMIs to gene variants allows accurate consensus generation for closely related genes with nanopore sequencing. We exemplify the utility of this approach by reconstructing the artificial phylogeny emerging in three rounds of directed evolution of an amine dehydrogenase biocatalyst via ultrahigh throughput droplet screening. Uniquely, we are able to identify lineages and their founding variant, as well as non-additive interactions between mutations within a full gene showing sign epistasis. Access to deep and accurate long reads will facilitate prediction of key beneficial mutations and adaptive potential based on in silico analysis of large sequence datasets.

Highlights

  • The success of protein evolution campaigns is strongly dependent on the sequence context in which mutations are introduced, stemming from pervasive non-additive interactions between a protein’s amino acids (‘intra-gene epistasis’)

  • Accurate consensus sequences are commonly generated for genome assemblies, where every sequencing read represents a unique fragment that overlaps with many others, facilitating stacking for accurate consensus generation[27,28]

  • We leverage unique molecular identifier (UMI)-tags to assign erroneous nanopore reads to their molecule of origin, facilitating clustering for accurate consensus formation even when starting with a pool of highly similar sequences

Read more

Summary

Introduction

The success of protein evolution campaigns is strongly dependent on the sequence context in which mutations are introduced, stemming from pervasive non-additive interactions between a protein’s amino acids (‘intra-gene epistasis’). We leverage UMI-tags to assign erroneous nanopore reads to their molecule of origin, facilitating clustering for accurate consensus formation even when starting with a pool of highly similar sequences (e.g. a library of gene variants in protein evolution generated by error-prone PCR). Such sequences typically differ in only a few point mutations and can currently not be distinguished reliably in an ordinary nanopore sequencing output. We apply our workflow to protein engineering and demonstrate the analysis of high-quality full sequence outputs through rounds of ultrahigh throughput directed evolution of an amine dehydrogenase (AmDH), tracking the emerging phylogeny —or the “walk through sequence space”2—towards higher activity in directed protein evolution

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call