Abstract

BackgroundThe quantity of genomic data is expanding at an increasing rate. Tools for phylogenetic analysis which scale to the quantity of available data are required. To address this need, we present cognac, a user-friendly software package to rapidly generate concatenated gene alignments for phylogenetic analysis.ResultsWe illustrate that cognac is able to rapidly identify phylogenetic marker genes using a data driven approach and efficiently generate concatenated gene alignments for very large genomic datasets. To benchmark our tool, we generated core gene alignments for eight unique genera of bacteria, including a dataset of over 11,000 genomes from the genus Escherichia producing an alignment with 1353 genes, which was constructed in less than 17 h.ConclusionsWe demonstrate that cognac presents an efficient method for generating concatenated gene alignments for phylogenetic analysis. We have released cognac as an R package (https://github.com/rdcrawford/cognac) with customizable parameters for adaptation to diverse applications.

Highlights

  • The quantity of genomic data is expanding at an increasing rate

  • multiple sequence alignment (MSA) is a foundational tool in many disciplines of biology, which aims to capture the relationships between residues of related biological sequences, and facilitate insights into the evolutionary or structural relationships between the sequences in the alignment

  • We present cognac, a novel datadriven method and rapid algorithm for identifying phylogenetic marker genes from whole genome sequences and generating concatenated gene alignments, which scales to extremely large datasets of greater than 11,000 bacterial genomes

Read more

Summary

Introduction

Tools for phylogenetic analysis which scale to the quantity of available data are required. To address this need, we present cognac, a user-friendly software package to rapidly generate concatenated gene alignments for phylogenetic analysis. It was quickly observed that individual gene trees are often inaccurate estimations of the species tree [3]. These incongruencies can arise from errors while building the tree, or from biological processes such as incomplete lineage sorting, hidden parology, and horizontal gene transfer [4]

Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.