Abstract
The recent widespread application of whole-genome sequencing (WGS) for microbial disease investigations has spurred the development of new bioinformatics tools, including a notable proliferation of phylogenomics pipelines designed for infectious disease surveillance and outbreak investigation. Transitioning the use of WGS data out of the research laboratory and into the front lines of surveillance and outbreak response requires user-friendly, reproducible and scalable pipelines that have been well validated. Single Nucleotide Variant Phylogenomics (SNVPhyl) is a bioinformatics pipeline for identifying high-quality single-nucleotide variants (SNVs) and constructing a whole-genome phylogeny from a collection of WGS reads and a reference genome. Individual pipeline components are integrated into the Galaxy bioinformatics framework, enabling data analysis in a user-friendly, reproducible and scalable environment. We show that SNVPhyl can detect SNVs with high sensitivity and specificity, and identify and remove regions of high SNV density (indicative of recombination). SNVPhyl is able to correctly distinguish outbreak from non-outbreak isolates across a range of variant-calling settings, sequencing-coverage thresholds or in the presence of contamination. SNVPhyl is available as a Galaxy workflow, Docker and virtual machine images, and a Unix-based command-line application. SNVPhyl is released under the Apache 2.0 license and available at http://snvphyl.readthedocs.io/ or at https://github.com/phac-nml/snvphyl-galaxy.
Highlights
The high-efficiency and cost-effectiveness of whole-genome sequencing (WGS) using next-generation sequencing technologies is transforming the biomedical landscape
We have developed Single Nucleotide Variant Phylogenomics (SNVPhyl) as an single nucleotide variant (SNV)-based phylogenomics pipeline that is integrated within the Galaxy platform providing a locally installable environment for phylogenomics analysis within a largerscale bioinformatics system
We measured SNVPhyl’s sensitivity and specificity by introducing random mutations along the E. coli Sakai reference genome and compared these mutations with those detected by SNVPhyl (Table 2)
Summary
The high-efficiency and cost-effectiveness of whole-genome sequencing (WGS) using next-generation sequencing technologies is transforming the biomedical landscape. Entire microbial genomes can be rapidly sequenced and subsequently queried with nucleotide-level resolution, an exciting new ability that far outstrips other traditional microbial typing methods. This powerful new ability has the potential to advance many fields, including in particular the field of infectious disease genomic epidemiology. One notable study is the investigation into the 2010 Haiti cholera outbreak [1,2,3], where WGS and epidemiological data were used in support of the hypothesis that cholera was introduced to Haiti from United Nations peacekeepers originally infected in Nepal. WGS has supported the investigation of outbreaks of organisms as diverse as Mycobacterium tuberculosis [4, 5], Received 6 February 2017; Accepted 12 April 2017 Author affiliations: 1National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB R3E 3R2, Canada; 2University of Manitoba, Winnipeg, MB R3T 2N2, Canada; 3Health Canada – Bureau of Microbial Hazards, Ottawa, ON K1A 0K9, Canada; 4Lethbridge Research and Development Centre, Lethbridge, AB T1J 4B1, Canada; 5Centers for Disease Control and Prevention, Atlanta, GA 30333, USA; 6Dalhousie University, Halifax, NS B3H 4R2, Canada; 7BC Public Health Microbiology and Reference Laboratory, Vancouver, BC V5Z 4R4, Canada; 8Simon Fraser University, Burnaby, BC V5A 1S6, Canada
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.