Abstract

BackgroundSingle nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of ‘putative’ SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays.ResultsCollating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays.ConclusionsOur results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be explored in future studies of non-model organisms.Electronic supplementary materialThe online version of this article (doi:10.1186/s13104-016-2209-x) contains supplementary material, which is available to authorized users.

Highlights

  • Single nucleotide polymorphism (SNP) discovery is an important goal of many studies

  • Blasting the Infinium and Affymetrix SNP sequences with an e-value threshold of 1e−12 recovered 24,247 and 22,368 hits respectively. For these SNPs, we evaluated the probability of successful validation using a predictive model incorporating minor allele frequency (MAF), depth of coverage, Assay Design Tool (ADT)/pconvert score plus values of the predictor variables generated from the genome BLAST

  • Regardless of the exact reasons, our findings suggest that under certain circumstances Affymetrix Axiom genotyping arrays might be preferable in some respects to Illumina Infinium iSelect HD arrays, when genotyping non-model organisms with SNPs that have not been experimentally validated in advance

Read more

Summary

Introduction

The number of ‘putative’ SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. Few systematic comparisons of the available programs have been carried out and most have mainly been based on genomic data from humans [9, 10] These studies suggest that in some cases the concordance between different methods can be poor [11, 12], yet it is still the norm to call SNPs with a single method [13,14,15]. Less attention has been paid to non-model organisms, partly because for many species, SNPs are being discovered for the first time

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.