Alignment-free Phylogenetic Placement and its Applications

Matthias Blanke

doi:10.53846/goediss-9762

Abstract

The study of the evolutionary interrelations of living organisms has been at the heart of biological sciences all along. A revolution in sequencing techniques in the past decades has caused a massive increase in molecular sequence data. As a result, contemporary methods assess evolutionary relationships between organisms by quantifying the degree of similarity between their biological sequence data. The discovered relationships of phylogenetic studies are commonly represented and visualized by phylogenetic trees or networks. Traditionally, sequences have been extracted from single organisms; however, recent technological progress has enabled the retrieval of sequence data directly from environmental samples. In doing so, large numbers of short sequencing reads arise that may originate from all organisms present in the respective environment. One major subsequent objective is the taxonomic or phylogenetic identification of those sequencing reads. However, longstanding maximum-likelihood-based de-novo phylogeny reconstruction methods are limited in their applicability by their computational demands; typically, they cannot be applied when the available molecular sequences are present in great numbers or are of great length. Fortunately, phylogenetic placement offers a unique approach to identify large sets of query reads within their phylogenetic context by inserting them into an existing phylogenetic tree comprising a set of reference sequences. Here, we present a new alignment- and assembly-free approach to phylogenetic placement, the Alignment-free phylogenetic placement algorithm based on Spaced-word Matches (App-SpaM). App-SpaM extracts short, non-contiguous subwords to detect homologies between the query and reference sequences, a method known as the spaced-word matches approach. It counts the number of such words and utilizes them to infer the average number of nucleotide substitutions between each read and each reference sequence. Then, it uses fast heuristics to infer a suitable placement position within the reference tree. We assessed how App-SpaM compares to existing algorithms for phylogenetic placement with respect to accuracy and computation speed in a comprehensive evaluation. We demonstrate that App-SpaM is on par with maximum- likelihood-based algorithms on metataxonomic data sets. In addition, App-SpaM is two to three orders of magnitude faster than the next fastest programs while its memory demands stay low. We extensively discuss App-SpaM’s advantages and drawbacks and propose several additional features to improve upon its original version: For this, we evaluate a set of novel placement heuristics, the use of sampling techniques to allow an improved scalability with the length of the reference sequences, and a measure for the uncertainty of proposed placement positions. Subsequently, we present a variety of novel use cases of phylogenetic that are made uniquely possible by App-SpaM’s versatility with respect to its potential input data. These applications include, in particular, the iterative augmentation of existing species trees by means of phylogenetic placement and the screening for outlier genes or species prior to phylogeny reconstruction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Alignment-free Phylogenetic Placement and its Applications

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Metagenomic Analysis Using Phylogenetic Placement-A Review of the First Decade.
Lucas Czech ... Micah Dunthorn
Frontiers in Bioinformatics | VOL. 2
Lucas Czech, et. al.Lucas Czech ... Micah Dunthorn
26 May 2022
Frontiers in Bioinformatics | VOL. 2

App-SpaM: phylogenetic placement of short reads without sequence alignment.
Matthias Blanke ... Burkhard Morgenstern
Bioinformatics advances | VOL. 1
Matthias Blanke, et. al.Matthias Blanke ... Burkhard Morgenstern
09 Jun 2021
Bioinformatics advances | VOL. 1

Ciliate SSU-rDNA reference alignments and trees for phylogenetic placements of metabarcoding data
Ľubomír Rajter ... Micah Dunthorn
Metabarcoding and Metagenomics | VOL. 5
Ľubomír Rajter, et. al.Ľubomír Rajter ... Micah Dunthorn
30 Aug 2021
Metabarcoding and Metagenomics | VOL. 5

Pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.
Frederick A Matsen ... Robin B Kodner
BMC Bioinformatics | VOL. 11
Frederick A Matsen, et. al.Frederick A Matsen ... Robin B Kodner
30 Oct 2010
BMC Bioinformatics | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Alignment-free Phylogenetic Placement and its Applications

Abstract

Talk to us

Similar Papers