Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons

Kemal Eren,Steven Weaver,Morné Valentyn,Sergei L Kosakovsky Pond,Venkatesh Kumar,Robert Ketteringham,Melissa Laird Smith,Sanjay Mohan,Ben Murrell,Timothée Poisot

doi:10.1371/journal.pcbi.1006498

Abstract

Next generation sequencing of viral populations has advanced our understanding of viral population dynamics, the development of drug resistance, and escape from host immune responses. Many applications require complete gene sequences, which can be impossible to reconstruct from short reads. HIV env, the protein of interest for HIV vaccine studies, is exceptionally challenging for long-read sequencing and analysis due to its length, high substitution rate, and extensive indel variation. While long-read sequencing is attractive in this setting, the analysis of such data is not well handled by existing methods. To address this, we introduce FLEA (Full-Length Envelope Analyzer), which performs end-to-end analysis and visualization of long-read sequencing data. FLEA consists of both a pipeline (optionally run on a high-performance cluster), and a client-side web application that provides interactive results. The pipeline transforms FASTQ reads into high-quality consensus sequences (HQCSs) and uses them to build a codon-aware multiple sequence alignment. The resulting alignment is then used to infer phylogenies, selection pressure, and evolutionary dynamics. The web application provides publication-quality plots and interactive visualizations, including an annotated viral alignment browser, time series plots of evolutionary dynamics, visualizations of gene-wide selective pressures (such as dN/dS) across time and across protein structure, and a phylogenetic tree browser. We demonstrate how FLEA may be used to process Pacific Biosciences HIV env data and describe recent examples of its use. Simulations show how FLEA dramatically reduces the error rate of this sequencing platform, providing an accurate portrait of complex and variable HIV env populations. A public instance of FLEA is hosted at http://flea.datamonkey.org. The Python source code for the FLEA pipeline can be found at https://github.com/veg/flea-pipeline. The client-side application is available at https://github.com/veg/flea-web-app. A live demo of the P018 results can be found at http://flea.murrell.group/view/P018.

Highlights

Generation sequencing (NGS) has become an invaluable tool for studying HIV and other rapidly evolving viruses by providing direct high resolution measurements of viral genetic diversity within the host
To study the evolutionary dynamics of entire genes during viral infection, data is collected via long-read sequencing at discrete time points, allowing us to understand how the virus changes over time
The entire pipeline was run on HIV env reads from donor P018, which are available from the NCBI Sequence Read Archive under BioProject PRJNA320111, and were sequenced as part of [36] on the RS-II instrument, using the older generation P5/C3 PacBio sequencing chemistry

Summary

Author summary

FLEA processes data from sequencing platforms that generate reads that are long, but error-prone. To study the evolutionary dynamics of entire genes during viral infection, data is collected via long-read sequencing at discrete time points, allowing us to understand how the virus changes over time. The experimental and sequencing process is imperfect, so the resulting data contain real evolutionary changes, and mutations and other genetic artifacts caused by sequencing errors. The resulting high-quality sequences are used for further analysis, such as building an evolutionary tree that tracks and interprets the genetic changes in the viral population over time. FLEA is open source, and is freely available online. This is a PLOS Computational Biology Software paper

Introduction

Design and implementation

Results

Results on simulated data

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS computational biology	Publication Date: Dec 13, 2018
Citations: 5	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS computational biology

Lead the way for us

Similar Papers

Long-read sequencing in ecology and evolution: Understanding how complex genetic and epigenetic variants shape biodiversity.
Dan G Bock ... Polina Novikova
Molecular Ecology | VOL. 32
Dan G Bock, et. al.Dan G Bock ... Polina Novikova
01 Mar 2023
Molecular Ecology | VOL. 32

Long-read RNA sequencing analysis of the lytic human cytomegalovirus transcriptome
Zsolt Balázs
-
Zsolt BalázsZsolt Balázs
05 Sep 2019
05 Sep 2019

Behavior model construction for client side of modern web applications
Weiwei Wang ... Junxia Guo
Tsinghua Science & Technology | VOL. 26
Weiwei Wang, et. al.Weiwei Wang ... Junxia Guo
20 Jul 2020
Tsinghua Science & Technology | VOL. 26

A comparison of short-read, HiFi long-read, and hybrid strategies for genome-resolved metagenomics.
Raphael Eisenhofer ... Wei-Hua Chen
Microbiology spectrum | VOL. 12
Raphael Eisenhofer, et. al.Raphael Eisenhofer ... Wei-Hua Chen
07 Mar 2024
Microbiology spectrum | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Full-Length Envelope Analyzer (FLEA): A tool for longitudinal analysis of viral amplicons

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS computational biology