Abstract

Restriction site-associated DNA Sequencing (RAD-Seq) is an economical and efficient method for SNP discovery and genotyping. As with other sequencing-by-synthesis methods, RAD-Seq produces stochastic count data and requires sensitive analysis to develop or genotype markers accurately. We show that there are several sources of bias specific to RAD-Seq that are not explicitly addressed by current genotyping tools, namely restriction fragment bias, restriction site heterozygosity and PCR GC content bias. We explore the performance of existing analysis tools given these biases and discuss approaches to limiting or handling biases in RAD-Seq data. While these biases need to be taken seriously, we believe RAD loci affected by them can be excluded or processed with relative ease in most cases and that most RAD loci will be accurately genotyped by existing tools.

Highlights

  • The use of high throughput sequencing-by-synthesis technologies for ecology and conservation depends on accurate inference of biological signal from technical noise

  • While it is possible to generate sequence data from entire genomes at relatively low cost, the sequencing-by-synthesis process introduces noise from a number of novel sources and reveals existing sources of noise that were previously undetected by less sensitive technology, making the path from raw sequence reads to biological information far from straightforward

  • Restriction site-associated DNA Sequencing (RAD-Seq) is suitable for fine-scale linkage mapping (Amores et al 2011; Chutimanitsakun et al 2011; Baxter et al 2011), phylogenetics and phylogeography (Rubin et al 2012; Nadeau et al 2012, Emerson et al 2010), genome scaffolding (Catchen et al 2011; Heliconius Genome Consortium 2012) and population genetics (Andersen et al 2012; Hohenlohe et al 2012)

Read more

Summary

Introduction

The use of high throughput sequencing-by-synthesis technologies for ecology and conservation depends on accurate inference of biological signal from technical noise. Restriction site-associated DNA sequencing (RADSeq; Miller et al 2007; Baird et al 2008; Davey & Blaxter 2011) is a method for SNP discovery and genotyping using sequencing-by-synthesis. It is one of a number of reduced representation methods that sample a shared set of sites across the genome in many individuals or pools, making population-scale sequencing possible at a fraction of the cost of whole genome sequencing (Davey et al 2011). RAD-Seq has been used to generate large SNP data sets for many species, most recently in salmon (Houston et al 2012), cutthroat and rainbow trout (Amish et al 2012), artichoke (Scaglione et al 2012), guppy (Willing et al 2011) and eggplant (Barchi et al 2011)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call