Comparative Analysis of SNP Discovery and Genotyping in Fagus sylvatica L. and Quercus robur L. Using RADseq, GBS, and ddRAD Methods

Bartosz Ulaszewski,Joanna Meger,Jaroslaw Burczyk

doi:10.3390/f12020222

Bartosz Ulaszewski, Joanna Meger + Show 1 more

Open Access

https://doi.org/10.3390/f12020222

Copy DOI

Journal: Forests	Publication Date: Feb 15, 2021
Citations: 11	License type: CC BY 4.0

Affiliation: Kazimierz Wielki University in Bydgoszcz

Abstract

Next-generation sequencing of reduced representation genomic libraries (RRL) is capable of providing large numbers of genetic markers for population genetic studies at relatively low costs. However, one major concern of these types of markers is the precision of genotyping, which is related to the common problem of missing data, which appears to be particularly important in association and genomic selection studies. We evaluated three RRL approaches (GBS, RADseq, ddRAD) and different SNP identification methods (de novo or based on a reference genome) to find the best solutions for future population genomics studies in two economically and ecologically important broadleaved tree species, namely F. sylvatica and Q. robur. We found that the use of ddRAD method coupled with SNP calling based on reference genomes provided the largest numbers of markers (28 k and 36 k for beech and oak, respectively), given standard filtering criteria. Using technical replicates of samples, we demonstrated that more than 80% of SNP loci should be considered as reliable markers in GBS and ddRAD, but not in RADseq data. According to the reference genomes’ annotations, more than 30% of the identified ddRAD loci appeared to be related to genes. Our findings provide a solid support for using ddRAD-based SNPs for future population genomics studies in beech and oak.

Highlights

To overcome a threat of false positive SNPs, which can occur with highly elevated numbers of markers, we suggest focusing on the proportion of good loci as the most important key indicator
Genotyping by next-generation sequencing (NGS) of reduced-representation genomic libraries associated with restriction enzymes became a common approach to identify large numbers of genetic markers uniformly distributed across genomes
The number and quality of RAD-based markers obtained in particular studies depends on many aspects, including the quality of DNA isolation, the choice of representation genomic libraries (RRL), the type of restriction enzymes, the design of sequencing, and the bioinformatics pipelines used for identification and calling of SNPs [8,22]

Summary

Introduction

The development of the next-generation sequencing (NGS) methods combined with various types of newly developed genomic library preparation protocols provided the tools for relatively inexpensive discovery and genotyping of large numbers of loci useful in population genomics studies [1,2,3,4]. The ultimate way of obtaining genomic data from multiple samples is to apply whole-genome sequencing (WGS). This approach maximizes the quantity of information gathered, and opens up the possibility of a wide variety of analyses; it is currently prohibitively expensive and computationally challenging [5], especially in non-model species with large genomes. In recent years many methods based on the RAD approach were developed, differing in the number of enzymes used or in additional steps of library preparation [8]

Methods

Results

Discussion

Conclusion