The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes.

Jason W Sahl,J Gregory Caporaso,David A Rasko,Paul Keim

doi:10.7717/peerj.332

Abstract

Background. As whole genome sequence data from bacterial isolates becomes cheaper to generate, computational methods are needed to correlate sequence data with biological observations. Here we present the large-scale BLAST score ratio (LS-BSR) pipeline, which rapidly compares the genetic content of hundreds to thousands of bacterial genomes, and returns a matrix that describes the relatedness of all coding sequences (CDSs) in all genomes surveyed. This matrix can be easily parsed in order to identify genetic relationships between bacterial genomes. Although pipelines have been published that group peptides by sequence similarity, no other software performs the rapid, large-scale, full-genome comparative analyses carried out by LS-BSR.Results. To demonstrate the utility of the method, the LS-BSR pipeline was tested on 96 Escherichia coli and Shigella genomes; the pipeline ran in 163 min using 16 processors, which is a greater than 7-fold speedup compared to using a single processor. The BSR values for each CDS, which indicate a relative level of relatedness, were then mapped to each genome on an independent core genome single nucleotide polymorphism (SNP) based phylogeny. Comparisons were then used to identify clade specific CDS markers and validate the LS-BSR pipeline based on molecular markers that delineate between classical E. coli pathogenic variant (pathovar) designations. Scalability tests demonstrated that the LS-BSR pipeline can process 1,000 E. coli genomes in 27–57 h, depending upon the alignment method, using 16 processors.Conclusions. LS-BSR is an open-source, parallel implementation of the BSR algorithm, enabling rapid comparison of the genetic content of large numbers of genomes. The results of the pipeline can be used to identify specific markers between user-defined phylogenetic groups, and to identify the loss and/or acquisition of genetic information between bacterial isolates. Taxa-specific genetic markers can then be translated into clinical diagnostics, or can be used to identify broadly conserved putative therapeutic candidates.

Highlights

Whole genome sequence (WGS) data has changed our view of bacterial relatedness and evolution
LS-BLAST score ratio (BSR) algorithm speed and scalability To determine the scalability of the Large Scale BSR method (LS-BSR) method, 1,000 Escherichia coli and Shigella genomes were downloaded from Genbank (Benson et al, 2012); E. coli was used as a test case due to the large number of genomes deposited in Genbank
Improvements on a previous BSR implementation The LS-BSR method is an improvement on a previous BSR implementation in terms of speed and ease of use

Summary

Introduction

Whole genome sequence (WGS) data has changed our view of bacterial relatedness and evolution. Grouping peptides into gene families is not the primary focus of LS-BSR, the output can be parsed to identify the pan-genome (Tettelin et al, 2008) structure of a species; scripts are included with LS-BSR that classify coding sequences (CDSs) into pan-genome categories based on user-defined identity thresholds. We present the large-scale BLAST score ratio (LSBSR) pipeline, which rapidly compares the genetic content of hundreds to thousands of bacterial genomes, and returns a matrix that describes the relatedness of all coding sequences (CDSs) in all genomes surveyed. This matrix can be parsed in order to identify genetic relationships between bacterial genomes. Taxa-specific genetic markers can be translated into clinical diagnostics, or can be used to identify broadly conserved putative therapeutic candidates

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PeerJ	Publication Date: Apr 1, 2014
Citations: 220	License type: cc-by

R Discovery Prime

R Discovery Prime

The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ

Lead the way for us

Similar Papers

Genomic characterization of Burkholderia pseudomallei isolates selected for medical countermeasures testing: comparative genomics associated with differential virulence.
Jason W Sahl ... Rebecca E Colman
PloS one | VOL. 10
Jason W Sahl, et. al.Jason W Sahl ... Rebecca E Colman
24 Mar 2015
PloS one | VOL. 10

Phylogenomics of Brazilian epidemic isolates of Mycobacterium abscessus subsp. bolletii reveals relationships of global outbreak strains
Rebecca M Davidson ... Michael Strong
Infection, Genetics and Evolution | VOL. 20
Rebecca M Davidson, et. al.Rebecca M Davidson ... Michael Strong
18 Sep 2013
Infection, Genetics and Evolution | VOL. 20

Outbreak of Shiga toxin-producing Escherichia coli O157:H7 linked to raw drinking milk resolved by rapid application of advanced pathogen characterisation methods, England, August to October 2017.
Juli Treacy ... Lukeki Kaindama
Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin | VOL. 24
Juli Treacy, et. al.Juli Treacy ... Lukeki Kaindama
18 Apr 2019
Euro surveillance : bulletin Europeen sur les maladies transmissibles = European communicable disease bulletin | VOL. 24

Genomic Evolution of Staphylococcus aureus During Artificial and Natural Colonization of the Human Nose
Manisha Goyal ... Caroline Mirande
Frontiers in Microbiology | VOL. 10
Manisha Goyal, et. al.Manisha Goyal ... Caroline Mirande
05 Jul 2019
Frontiers in Microbiology | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The large-scale blast score ratio (LS-BSR) pipeline: a method to rapidly compare genetic content between bacterial genomes.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PeerJ