Abstract

Microarrays can be a cost-effective alternative to high-throughput sequencing for discovering novel single-nucleotide polymorphisms (SNPs). Illumina’s iScan platform dominates the market, but their commercial microarray products are designed for model organisms. Further, the platform outputs data in a proprietary format. This cannot be easily converted to human-readable genotypes or be merged with pre-existing data. To address this, we present and validate a novel pipeline to facilitate data analysis from cross-species application of Illumina microarrays. This facilitates the generation of a compatible VCF from iScan data and the merging of this with a second VCF comprising genotypes derived from other samples and sources. Our pipeline includes a custom script, iScanVCFMerge (presented as a Python package), which we validate using iScan data from three great ape genera. We conclude that cross-species application of microarrays can be a rapid, cost-effective approach for SNP discovery in non-model organisms. Our pipeline surmounts the common challenges of integrating iScan genotypes with pre-existing data.

Highlights

  • Single-nucleotide polymorphisms (SNPs) are a powerful tool for population genetic studies

  • We show that microarrays for non-target species are an ideal tool for rapid and inexpensive single-nucleotide polymorphisms (SNPs) discovery

  • After removing homozygous and purely heterozygous SNPs and filtering for Minor Allele Frequency (MAF), we were left with 48,831 polymorphic SNPs for chimpanzees, 47,536 polymorphic SNPs for gorillas, and 44,389 polymorphic SNPs for orang-utans (Table 1)

Read more

Summary

Introduction

Single-nucleotide polymorphisms (SNPs) are a powerful tool for population genetic studies. In contrast with mainstay mitochondrial and microsatellite markers, SNPs can be generated at higher quality and with broader genome coverage and provide equivalent or greater statistical power for downstream studies (Morin et al, 2004). High-density SNP arrays are especially simple and costeffective for the study of model organisms. In contrast with sequencing approaches, SNP arrays have built-in SNP redundancy (Oliphant et al, 2002) and call genotypes by averaging over multiple calls to increase accuracy. They uniformly genotype all individuals at the exact same loci. Commercial arrays are widely available, for association studies in humans (Ha et al, 2014), to develop breeding programs for livestock (Goddard and Hayes, 2009), and to facilitate

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call