UMI-linked consensus sequencing enables phylogenetic analysis of directed evolution

Paul Jannis Zurek,Philipp Knyphausen,Florian Hollfelder,Katharina Neufeld,Ahir Pushpanath

doi:10.1038/s41467-020-19687-9

Paul Jannis Zurek, Philipp Knyphausen + Show 3 more

Open Access

https://doi.org/10.1038/s41467-020-19687-9

Copy DOI

Abstract

The success of protein evolution campaigns is strongly dependent on the sequence context in which mutations are introduced, stemming from pervasive non-additive interactions between a protein’s amino acids (‘intra-gene epistasis’). Our limited understanding of such epistasis hinders the correct prediction of the functional contributions and adaptive potential of mutations. Here we present a straightforward unique molecular identifier (UMI)-linked consensus sequencing workflow (UMIC-seq) that simplifies mapping of evolutionary trajectories based on full-length sequences. Attaching UMIs to gene variants allows accurate consensus generation for closely related genes with nanopore sequencing. We exemplify the utility of this approach by reconstructing the artificial phylogeny emerging in three rounds of directed evolution of an amine dehydrogenase biocatalyst via ultrahigh throughput droplet screening. Uniquely, we are able to identify lineages and their founding variant, as well as non-additive interactions between mutations within a full gene showing sign epistasis. Access to deep and accurate long reads will facilitate prediction of key beneficial mutations and adaptive potential based on in silico analysis of large sequence datasets.

Highlights

The success of protein evolution campaigns is strongly dependent on the sequence context in which mutations are introduced, stemming from pervasive non-additive interactions between a protein’s amino acids (‘intra-gene epistasis’)
Accurate consensus sequences are commonly generated for genome assemblies, where every sequencing read represents a unique fragment that overlaps with many others, facilitating stacking for accurate consensus generation[27,28]
We leverage unique molecular identifier (UMI)-tags to assign erroneous nanopore reads to their molecule of origin, facilitating clustering for accurate consensus formation even when starting with a pool of highly similar sequences

Summary

Introduction

The success of protein evolution campaigns is strongly dependent on the sequence context in which mutations are introduced, stemming from pervasive non-additive interactions between a protein’s amino acids (‘intra-gene epistasis’). We leverage UMI-tags to assign erroneous nanopore reads to their molecule of origin, facilitating clustering for accurate consensus formation even when starting with a pool of highly similar sequences (e.g. a library of gene variants in protein evolution generated by error-prone PCR). Such sequences typically differ in only a few point mutations and can currently not be distinguished reliably in an ordinary nanopore sequencing output. We apply our workflow to protein engineering and demonstrate the analysis of high-quality full sequence outputs through rounds of ultrahigh throughput directed evolution of an amine dehydrogenase (AmDH), tracking the emerging phylogeny —or the “walk through sequence space”2—towards higher activity in directed protein evolution

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Nature Communications	Publication Date: Nov 26, 2020
Citations: 36	License type: open-access

R Discovery Prime

R Discovery Prime

UMI-linked consensus sequencing enables phylogenetic analysis of directed evolution

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications

Lead the way for us

Similar Papers

The Use of Unique Molecular Identifiers (UMIs) Strongly Improves Sequencing Detection Limits Allowing Earlier Detection of Small TP53 Mutated Clones in Leukemias
Constance Regina Baer ... Torsten Haferlach
Blood | VOL. 128
Constance Regina Baer, et. al.Constance Regina Baer ... Torsten Haferlach
02 Dec 2016
Blood | VOL. 128

High-accuracy long-read amplicon sequences using unique molecular identifiers with Nanopore or PacBio sequencing.
Søren M Karst ... Emil A Sørensen
Nature Methods | VOL. 18
Søren M Karst, et. al.Søren M Karst ... Emil A Sørensen
11 Jan 2021
Nature Methods | VOL. 18

Editor's evaluation: Improved T cell receptor antigen pairing through data-driven filtering of sequencing information from single cells
K Christopher Garcia
-
K Christopher GarciaK Christopher Garcia
11 Oct 2022
11 Oct 2022

LUCS: a high-resolution nucleic acid sequencing tool for accurate long-read analysis of individual DNA molecules.
Sofia Annis ... Melissa Franco
Aging | VOL. 12
Sofia Annis, et. al.Sofia Annis ... Melissa Franco
28 Apr 2020
Aging | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

UMI-linked consensus sequencing enables phylogenetic analysis of directed evolution

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Communications