Abstract

A growing variety of “genotype-by-sequencing” (GBS) methods use restriction enzymes and high throughput DNA sequencing to generate data for a subset of genomic loci, allowing the simultaneous discovery and genotyping of thousands of polymorphisms in a set of multiplexed samples. We evaluated a “double-digest” restriction-site associated DNA sequencing (ddRAD-seq) protocol by 1) comparing results for a zebra finch (Taeniopygia guttata) sample with in silico predictions from the zebra finch reference genome; 2) assessing data quality for a population sample of indigobirds (Vidua spp.); and 3) testing for consistent recovery of loci across multiple samples and sequencing runs. Comparison with in silico predictions revealed that 1) over 90% of predicted, single-copy loci in our targeted size range (178–328 bp) were recovered; 2) short restriction fragments (38–178 bp) were carried through the size selection step and sequenced at appreciable depth, generating unexpected but nonetheless useful data; 3) amplification bias favored shorter, GC-rich fragments, contributing to among locus variation in sequencing depth that was strongly correlated across samples; 4) our use of restriction enzymes with a GC-rich recognition sequence resulted in an up to four-fold overrepresentation of GC-rich portions of the genome; and 5) star activity (i.e., non-specific cutting) resulted in thousands of “extra” loci sequenced at low depth. Results for three species of indigobirds show that a common set of thousands of loci can be consistently recovered across both individual samples and sequencing runs. In a run with 46 samples, we genotyped 5,996 loci in all individuals and 9,833 loci in 42 or more individuals, resulting in <1% missing data for the larger data set. We compare our approach to similar methods and discuss the range of factors (fragment library preparation, natural genetic variation, bioinformatics) influencing the recovery of a consistent set of loci among samples.

Highlights

  • A variety of new ‘‘genotype by sequencing’’ (GBS) methods share the common feature of using one or more restriction enzymes to target a subset of genomic loci for high-throughput DNA sequencing, allowing the simultaneous discovery and genotyping of genetic polymorphisms in a set of multiplexed samples [1]

  • Using two enzymes combined with size selection further reduces the number of loci, targeting only those portions of the genome with cut sites for the selected enzymes in close proximity (e.g., [6,10,11,12]). ‘‘Double-digest, restriction-site associated DNA sequencing’’ [6] streamlines fragment library preparation in comparison to the original Restriction-site Associated DNA Sequencing (RAD-seq) method [8]

  • Potential pitfalls and biases associated with laboratory protocols, natural genetic variation, and computational processing of the sequence data all may affect the degree to which a common set of homologous loci is recovered across samples

Read more

Summary

Introduction

A variety of new ‘‘genotype by sequencing’’ (GBS) methods share the common feature of using one or more restriction enzymes to target a subset of genomic loci for high-throughput DNA sequencing, allowing the simultaneous discovery and genotyping of genetic polymorphisms in a set of multiplexed samples [1]. Applicable in both model and non-model organisms, these methods generate massive datasets for a range of applications from genetic mapping to population genetics, phylogeography, and molecular systematics [2,3,4,5]. Other means to reduce the number of loci include selective pre-amplification [10], the use of a third enzyme leaving ‘‘sticky ends’’ not compatible with adapters [13], and the use of type IIB enzymes with selective adapters [14]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call