Abstract

The ability to efficiently and accurately determine genotypes is a keystone technology in modern genetics, crucial to studies ranging from clinical diagnostics, to genotype-phenotype association, to reconstruction of ancestry and the detection of selection. To date, high capacity, low cost genotyping has been largely achieved via “SNP chip” microarray-based platforms which require substantial prior knowledge of both genome sequence and variability, and once designed are suitable only for those targeted variable nucleotide sites. This method introduces substantial ascertainment bias and inherently precludes detection of rare or population-specific variants, a major source of information for both population history and genotype-phenotype association. Recent developments in reduced-representation genome sequencing experiments on massively parallel sequencers (commonly referred to as RAD-tag or RADseq) have brought direct sequencing to the problem of population genotyping, but increased cost and procedural and analytical complexity have limited their widespread adoption. Here, we describe a complete laboratory protocol, including a custom combinatorial indexing method, and accompanying software tools to facilitate genotyping across large numbers (hundreds or more) of individuals for a range of markers (hundreds to hundreds of thousands). Our method requires no prior genomic knowledge and achieves per-site and per-individual costs below that of current SNP chip technology, while requiring similar hands-on time investment, comparable amounts of input DNA, and downstream analysis times on the order of hours. Finally, we provide empirical results from the application of this method to both genotyping in a laboratory cross and in wild populations. Because of its flexibility, this modified RADseq approach promises to be applicable to a diversity of biological questions in a wide range of organisms.

Highlights

  • The genome serves simultaneously as a basic blueprint, encoding information for proper cellular and developmental processes necessary to produce an organism, and as a historical record of the demographic processes and selective forces acting in a given lineage

  • We have developed a protocol that builds on the Restriction Associated DNA sequencing (RADseq) method [19] but which differs in two principal respects (Figure 2)

  • Identification of Multiplexed Samples To efficiently ‘‘de-multiplex’’ this two-tier indexing scheme and to manage thousands of samples in a flexible, portable and lightweight Laboratory Information Management System (LIMS), we have developed a simple combination of Google Documents Spreadsheets and database tools in the Python programming language

Read more

Summary

Introduction

The genome serves simultaneously as a basic blueprint, encoding information for proper cellular and developmental processes necessary to produce an organism, and as a historical record of the demographic processes and selective forces acting in a given lineage. Studies of genetic information both encoded and recorded in genomes work with the same currency–comparison of homologous sequences across individuals–but these approaches employ very different modes of inference, and as such the details of a particular experiment dictate optimal marker resolution (Figure 1). To address the need for flexibility in marker number, we describe a next-generation sequencing-based method for determining individual sequence genotypes that can be tuned to sample a large range (from hundreds to hundreds of thousands) of randomly distributed regions genome-wide. Studies extended RADseq to species that lack a reference genome sequence, but have restricted variant discovery to only those regions that contained at most one or two polymorphic sites [10,11,12]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.