Abstract
Bacteria are highly diverse, even within a species; thus, there have been many studies which classify a single species into multiple types and analyze the genetic differences between them. Recently, the use of whole-genome sequencing (WGS) has been popular for these analyses, and the identification of single-nucleotide polymorphisms (SNPs) between isolates is the most basic analysis performed following WGS. The performance of SNP-calling methods therefore has a significant effect on the accuracy of downstream analyses, such as phylogenetic tree inference. In particular, when closely related isolates are analyzed, e.g. in outbreak investigations, some SNP callers tend to detect a high number of false-positive SNPs compared with the limited number of true SNPs among isolates. However, the performances of various SNP callers in such a situation have not been validated sufficiently. Here, we show the results of realistic benchmarks of commonly used SNP callers, revealing that some of them exhibit markedly low accuracy when target isolates are closely related. As an alternative, we developed a novel pipeline BactSNP, which utilizes both assembly and mapping information and is capable of highly accurate and sensitive SNP calling in a single step. BactSNP is also able to call SNPs among isolates when the reference genome is a draft one or even when the user does not input the reference genome. BactSNP is available at https://github.com/IEkAdN/BactSNP.
Highlights
Dai Yoshimura1, Rei Kajitani1, Yasuhiro Gotoh2, Katsuyuki Katahira2, Miki Okuno1, Yoshitoshi Ogura2, Tetsuya Hayashi2, Takehiko Itoh1
SNPs with QUAL > 20 were extracted by vcffilter as described in its README [12] and they were output to a .tsv file using our original program, get_snp_freebayes
Called alleles at each site with a QUAL score below 30 or supported by fewer than 75% of reads mapped at that site were masked as ambiguous and the remaining SNPs were output to a .tsv file using our original program, get_snp_samtools
Summary
Dai Yoshimura1, Rei Kajitani1, Yasuhiro Gotoh2, Katsuyuki Katahira2, Miki Okuno1, Yoshitoshi Ogura2, Tetsuya Hayashi2, Takehiko Itoh1. We moved the correct SNP positions to random positions in regions where nucmer generated one-to-one alignments ≥ 1 kbp in length between the reference-root sequences and edited the simulated genomes so that they had SNPs on the new positions by using our original program, move_snp (available at https://github.com/IEkAdN/BactSNP/tree/master/benchmark). “PASS” were extracted from the resulting .vcf file and output to a .tsv file using our original program, get_snp_cortex (available at https://github.com/IEkAdN/BactSNP/tree/master/benchmark).
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.