Abstract

Knowledge of the genomic variation among different strains of a pathogenic microbial species can help in selecting optimal candidates for diagnostic assays and vaccine development. Pooled sequencing (Pool-seq) is a cost effective approach for population level genetic studies that require large numbers of samples such as various strains of a microbe. To test the use of Pool-seq in identifying variation, we pooled DNA of 100 Streptococcus pyogenes strains of different emm types in two pools, each containing 50 strains. We used four variant calling tools (Freebayes, UnifiedGenotyper, SNVer, and SAMtools) and one emm1 strain, SF370, as a reference genome. In total 63719 SNPs and 164 INDELs were identified in the two pools concordantly by at least two of the tools. Majority of the variants (93.4%) from six individually sequenced strains used in the pools could be identified from the two pools and 72.3% and 97.4% of the variants in the pools could be mined from the analysis of the 44 complete Str. pyogenes genomes and 3407 sequence runs deposited in the European Nucleotide Archive respectively. We conclude that DNA sequencing of pooled samples of large numbers of bacterial strains is a robust, rapid and cost-efficient way to discover sequence variation.

Highlights

  • Knowledge of the genomic variation among different strains of a pathogenic microbial species can help in selecting optimal candidates for diagnostic assays and vaccine development

  • We show that the Pool-seq strategy was successfully used in identifying variants in a large number of GAS strains

  • We show that data from large published datasets, such as the European Nucleotide Archive (ENA), can be used to identify variations in a wide range of GAS strains

Read more

Summary

Introduction

Knowledge of the genomic variation among different strains of a pathogenic microbial species can help in selecting optimal candidates for diagnostic assays and vaccine development. Limitations of pooling strategies include loss of linkage disequilibrium information, difficulty of distinguishing between sequencing errors and low frequency alleles and bias in allele frequency estimation resulting from inaccuracies in pooled DNA concentrations[8,14] The implications of these limitations will likely be inessential if Pool-seq is used to identify the genomic or protein sequences with minimal variability for choosing optimal candidates to be used in diagnostic assays or vaccine development, given large enough pool sizes and sequencing depth. This study is the first to utilize Pool-seq to identify polymorphisms from different strains of GAS and the first to compare the efficacy of Pool-seq to microbial sequences in a large genomic archive such as ENA. Our results confirm the robustness and cost-efficiency of the Pool-seq approach for variant discovery especially when coupled with polyploidy aware variant calling tools

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.