Abstract

Abstract High‐throughput sequencing has become commonplace in evolutionary studies. Large, rapidly collected genomic datasets are used to capture biodiversity and for monitoring global and national scale disease transmission patterns, among many other applications. Updating homologous sequence datasets with new samples is cumbersome, requiring excessive program runtimes and data processing. We describe Extensiphy, a bioinformatics tool to efficiently update multiple sequence alignments with whole‐genome short‐read data. Extensiphy performs reference based sequence assembly and alignment in one process while maintaining the alignment length of the original alignment. Input data‐types for Extensiphy are any multiple sequence alignment in fasta format and whole‐genome, short‐read fastq sequences. To validate Extensiphy, we compared its results to those produced by two other methods that construct whole‐genome scale multiple sequence alignments. We measured our comparisons by analysing program runtimes, base‐call accuracy, dataset retention in the presence of missing data and phylogenetic accuracy. We found that Extensiphy rapidly produces high‐quality updated sequence alignments while preventing alignment shrinkage due to missing data. Phylogenies estimated from alignments produced by Extensiphy show similar accuracy to other commonly used alignment construction methods. Extensiphy is suitable for updating large sequence alignments and is ideal for studies of biodiversity, ecology and epidemiological monitoring efforts.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.