Abstract

Single nucleotide polymorphisms (SNPs) are widely used in genome-wide association studies and population genetics analyses. Next-generation sequencing (NGS) has become convenient, and many SNP-calling pipelines have been developed for human NGS data. We took advantage of a gap knowledge in selecting the appropriated SNP calling pipeline to handle with high-throughput NGS data. To fill this gap, we studied and compared seven SNP calling pipelines, which include 16GT, genome analysis toolkit (GATK), Bcftools-single (Bcftools single sample mode), Bcftools-multiple (Bcftools multiple sample mode), VarScan2-single (VarScan2 single sample mode), VarScan2-multiple (VarScan2 multiple sample mode) and Freebayes pipelines, using 96 NGS data with the different depth gradients of approximately 5X, 10X, 20X, 30X, 40X, and 50X coverage from 16 Rhode Island Red chickens. The sixteen chickens were also genotyped with a 50K SNP array, and the sensitivity and specificity of each pipeline were assessed by comparison to the results of SNP arrays. For each pipeline, except Freebayes, the number of detected SNPs increased as the input read depth increased. In comparison with other pipelines, 16GT, followed by Bcftools-multiple, obtained the most SNPs when the input coverage exceeded 10X, and Bcftools-multiple obtained the most when the input was 5X and 10X. The sensitivity and specificity of each pipeline increased with increasing input. Bcftools-multiple had the highest sensitivity numerically when the input ranged from 5X to 30X, and 16GT showed the highest sensitivity when the input was 40X and 50X. Bcftools-multiple also had the highest specificity, followed by GATK, at almost all input levels. For most calling pipelines, there were no obvious changes in SNP numbers, sensitivities or specificities beyond 20X. In conclusion, (1) if only SNPs were detected, the sequencing depth did not need to exceed 20X; (2) the Bcftools-multiple may be the best choice for detecting SNPs from chicken NGS data, but for a single sample or sequencing depth greater than 20X, 16GT was recommended. Our findings provide a reference for researchers to select suitable pipelines to obtain SNPs from the NGS data of chickens or nonhuman animals.

Highlights

  • In the last decade, next-generation sequencing (NGS) has been extensively used in human, livestock and plant research [1,2,3,4,5]

  • single nucleotide polymorphisms (SNP) might occur at nonspecific positions in the genome and have been widely used in genome-wide association studies and population genetics analyses [9]

  • Ni et al [7] thought that genome analysis toolkit (GATK), SAMtools and Freebayes were all good for processing high-throughput chicken data, but we found that the research in the article used low sequencing depth data, tested relatively few pipelines, and lacked detailed implementation procedures

Read more

Summary

Introduction

Next-generation sequencing (NGS) has been extensively used in human, livestock and plant research [1,2,3,4,5]. An increasing number of single nucleotide polymorphisms (SNPs) have been detected in NGS datasets using various calling pipelines [6,7,8]. VarScan (http://varscan.sourceforge.net/using-varscan.html) is the first tool used for the detection of somatic mutations and copy number alterations in exome data from tumor-normal pairs [16]. Freebayes (https://github.com/ekg/freebayes) is a Bayesian genetic variant caller designed to find SNPs, indels, multinucleotide polymorphisms, and complex events (composite insertion and substitution events) smaller than the length of a shortread sequencing alignment [18]. Chiara et al provided a consensus variant calling system, CoVaCS (https://bioinformatics. cineca.it/covacs), for the analysis of human genome resequencing studies [20]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call