Effective double-digest RAD sequencing and genotyping despite large genome size.

Roberta Gargiulo,Michael F Fay,Tiiu Kull

doi:10.1111/1755-0998.13314

Abstract

Obtaining informative data is the ambition of any genomic project, but in nonmodel species with very large genomes, pursuing such a goal requires surmounting a series of analytical challenges. Double-digest RAD sequencing is routinely used in nonmodel organisms and offers some control over the volume of data obtained. However, the volume of data recovered is not always an indication of the reliability of data sets, and quality checks are necessary to ensure that true and artefactual information is set apart. In the present study, we aim to fill the gap existing between the known applicability of RAD sequencing methods in plants with large genomes and the use of the retrieved loci for population genetic inference. By analysing two populations of Cypripedium calceolus, a nonmodel orchid species with a large genome size (1C ~31.6Gbp), we provide a complete workflow from library preparation to bioinformatic filtering and inference of genetic diversity and differentiation. We show how filtering strategies to dismiss potentially misleading data need to be explored and adapted to data set-specific features. Moreover, we suggest that the occurrence of organellar sequences in libraries should not be neglected when planning the experiment and analysing the results. Finally, we explain how, in the absence of prior information about the genome of the species, seeking high standards of quality during library preparation and sequencing can provide an insurance against unpredicted technical or biological constraints.

Full Text