Abstract

BackgroundCurrent alignment tools typically lack an explicit model of indel evolution, leading to artificially short inferred alignments (i.e., over-alignment) due to inconsistencies between the indel history and the phylogeny relating the input sequences.ResultsWe present a new progressive multiple sequence alignment tool ProPIP. The process of insertions and deletions is described using an explicit evolutionary model—the Poisson Indel Process or PIP. The method is based on dynamic programming and is implemented in a frequentist framework. The source code can be compiled on Linux, macOS and Microsoft Windows platforms. The algorithm is implemented in C++ as standalone program. The source code is freely available on GitHub at https://github.com/acg-team/ProPIP and is distributed under the terms of the GNU GPL v3 license.ConclusionsThe use of an explicit indel evolution model allows to avoid over-alignment, to infer gaps in a phylogenetically consistent way and to make inferences about the rates of insertions and deletions. Instead of the arbitrary gap penalties, the parameters used by ProPIP are the insertion and deletion rates, which have biological interpretation and are contextualized in a probabilistic environment. As a result, indel rate settings may be optimised in order to infer phylogenetically meaningful gap patterns.

Highlights

  • Current alignment tools typically lack an explicit model of indel evolution, leading to artificially short inferred alignments due to inconsistencies between the indel history and the phylogeny relating the input sequences

  • Implementation Here we present the ProPIP software, which implements our originally published progressive Multiple sequence alignment (MSA) inference method based on Poisson Insertions and deletions (Indel) Process (PIP) [7], and introduces new features, such as stochastic backtracking and parallelisation

  • Instead of the traditional gap penalties, the parameters used by ProPIP are the insertion and deletion rates, which have biological interpretation and are contextualized in a probabilistic environment

Read more

Summary

Introduction

Current alignment tools typically lack an explicit model of indel evolution, leading to artificially short inferred alignments (i.e., over-alignment) due to inconsistencies between the indel history and the phylogeny relating the input sequences. Multiple sequence alignment (MSA) is a fundamental task required by most genomic analyses, with a multitude of alignment tools already available. The progressive approach is one of popular strategies that involves aligning pairs of sequences or alignments from the tips towards the root along the tree structure that represents the evolutionary relationship of the input sequences (i.e., the tree leaves). The partial MSA solutions at each inner node and the final MSA at the tree root correspond to the local optima obtained by maximizing the partial solution in a smaller space spanned by local pairwise alignments.

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.