An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes

Thomas K F Wong,Steven H Wu,Teng Li,Allen G Rodrigo,Jeet Sukumaran,Louis Ranjard

doi:10.1371/journal.pcbi.1008949

Abstract

A current strategy for obtaining haplotype information from several individuals involves short-read sequencing of pooled amplicons, where fragments from each individual is identified by a unique DNA barcode. In this paper, we report a new method to recover the phylogeny of haplotypes from short-read sequences obtained using pooled amplicons from a mixture of individuals, without barcoding. The method, AFPhyloMix, accepts an alignment of the mixture of reads against a reference sequence, obtains the single-nucleotide-polymorphisms (SNP) patterns along the alignment, and constructs the phylogenetic tree according to the SNP patterns. AFPhyloMix adopts a Bayesian inference model to estimate the phylogeny of the haplotypes and their relative abundances, given that the number of haplotypes is known. In our simulations, AFPhyloMix achieved at least 80% accuracy at recovering the phylogenies and relative abundances of the constituent haplotypes, for mixtures with up to 15 haplotypes. AFPhyloMix also worked well on a real data set of kangaroo mitochondrial DNA sequences.

Highlights

Molecular phylogenetic reconstruction is the mainstay of modern evolutionary biology [1, 2]
This research demonstrates the feasibility of reconstructing a phylogenetic tree directly from the short read sequences obtained from a mixture of closely related amplified sequences, without barcoding
AFPhyloMix is designed to estimate the concentration of haplotypes and reconstruct the phylogenetic tree directly from the short read sequences in the mixture of haplotypes with no barcode, given that the number of haplotypes is known

Summary

Introduction

Molecular phylogenetic reconstruction is the mainstay of modern evolutionary biology [1, 2]. Because modern sequencing technologies can produce several gigabases of nucleotide sequences in a single day, one of the challenges for the molecular phylogeneticist is to deal with this quantity of data in a timely manner while still reconstructing accurate phylogenies To this end, phylogeneticists have developed rapid alignment and tree reconstruction algorithms [4, 5], using pre-processed and curated sequences. Pre-processing and sequence curation can be laborious, but are necessary tasks because a great deal of sequence data are generated using generation short-read sequencing technologies Such sequences are often barcoded using unique DNA identifier tags, and collectively pooled and sequenced in a single run. The unique barcode allows sequences belonging to different samples to be separated computationally, before additional error-correction and subsequent down-stream analyses are performed

Methods

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology

Lead the way for us

Journal: PLOS Computational Biology	Publication Date: Sep 13, 2021
License type: CC BY 4.0

Similar Papers

An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes
Teng Li ... Thomas K F Wong
-
Teng Li, et. al.Teng Li ... Thomas K F Wong
13 Sep 2021
13 Sep 2021

Molecular genetic investigative leads to differentiate monozygotic twins.
Bruce Budowle
Investigative Genetics | VOL. 5
Bruce BudowleBruce Budowle
01 Jan 2014
Investigative Genetics | VOL. 5

Choice of reference sequence and assembler for alignment of Listeria monocytogenes short-read sequence data greatly influences rates of error in SNP analyses.
Arthur W Pightling ... Andrew R Dalby
PLoS ONE | VOL. 9
Arthur W Pightling, et. al.Arthur W Pightling ... Andrew R Dalby
21 Aug 2014
PLoS ONE | VOL. 9

Clinical application of massively parallel sequencing in the molecular diagnosis of glycogen storage diseases of genetically heterogeneous origin
Jing Wang ... Yin-Hsiu Chien
Genetics in Medicine | VOL. 15
Jing Wang, et. al.Jing Wang ... Yin-Hsiu Chien
16 Aug 2012
Genetics in Medicine | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLOS Computational Biology