PhyBWT2: phylogeny reconstruction via eBWT positional clustering

Veronica Guerrini,Alessio Conte,Roberto Grossi,Gianni Liti,Giovanna Rosone,Lorenzo Tattini

doi:10.1186/s13015-023-00232-4

Abstract

BackgroundMolecular phylogenetics studies the evolutionary relationships among the individuals of a population through their biological sequences. It may provide insights about the origin and the evolution of viral diseases, or highlight complex evolutionary trajectories. A key task is inferring phylogenetic trees from any type of sequencing data, including raw short reads. Yet, several tools require pre-processed input data e.g. from complex computational pipelines based on de novo assembly or from mappings against a reference genome. As sequencing technologies keep becoming cheaper, this puts increasing pressure on designing methods that perform analysis directly on their outputs. From this viewpoint, there is a growing interest in alignment-, assembly-, and reference-free methods that could work on several data including raw reads data.ResultsWe present phyBWT2, a newly improved version of phyBWT (Guerrini et al. in 22nd International Workshop on Algorithms in Bioinformatics (WABI) 242:23–12319, 2022). Both of them directly reconstruct phylogenetic trees bypassing both the alignment against a reference genome and de novo assembly. They exploit the combinatorial properties of the extended Burrows-Wheeler Transform (eBWT) and the corresponding eBWT positional clustering framework to detect relevant blocks of the longest shared substrings of varying length (unlike the k-mer-based approaches that need to fix the length k a priori). As a result, they provide novel alignment-, assembly-, and reference-free methods that build partition trees without relying on the pairwise comparison of sequences, thus avoiding to use a distance matrix to infer phylogeny. In addition, phyBWT2 outperforms phyBWT in terms of running time, as the former reconstructs phylogenetic trees step-by-step by considering multiple partitions, instead of just one partition at a time, as previously done by the latter.ConclusionsBased on the results of the experiments on sequencing data, we conclude that our method can produce trees of quality comparable to the benchmark phylogeny by handling datasets of different types (short reads, contigs, or entire genomes). Overall, the experiments confirm the effectiveness of phyBWT2 that improves the performance of its previous version phyBWT, while preserving the accuracy of the results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PhyBWT2: phylogeny reconstruction via eBWT positional clustering

Abstract

Talk to us

Similar Papers

More From: Algorithms for molecular biology : AMB

Lead the way for us

Journal: Algorithms for molecular biology : AMB	Publication Date: Aug 3, 2023
License type: CC BY 4.0

Similar Papers

Performance analysis of conventional and AI-based variant callers using short and long reads
Omar Abdelwahab ... Davoud Torkamaneh
BMC bioinformatics | VOL. 24
Omar Abdelwahab, et. al.Omar Abdelwahab ... Davoud Torkamaneh
14 Dec 2023
BMC bioinformatics | VOL. 24

Long-read sequencing in ecology and evolution: Understanding how complex genetic and epigenetic variants shape biodiversity.
Dan G Bock ... Polina Novikova
Molecular ecology | VOL. 32
Dan G Bock, et. al.Dan G Bock ... Polina Novikova
01 Mar 2023
Molecular ecology | VOL. 32

Performance, accuracy, and Web server for evolutionary placement of short sequence reads under maximum likelihood.
Simon A Berger ... Alexandros Stamatakis
Systematic Biology | VOL. 60
Simon A Berger, et. al.Simon A Berger ... Alexandros Stamatakis
23 Mar 2011
Systematic Biology | VOL. 60

Rapid hybrid de novo assembly of a microbial genome using only short reads: Corynebacterium pseudotuberculosis I19 as a case study
Louise Teixeira Cerdeira ... Artur Silva
Journal of Microbiological Methods | VOL. 86
Louise Teixeira Cerdeira, et. al.Louise Teixeira Cerdeira ... Artur Silva
18 May 2011
Journal of Microbiological Methods | VOL. 86

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PhyBWT2: phylogeny reconstruction via eBWT positional clustering

Abstract

Talk to us

Similar Papers

More From: Algorithms for molecular biology : AMB