Dealing with paralogy in RADseq data: in silico detection and single nucleotide polymorphism validation in Robinia pseudoacacia L.

Cindy F Verdu,Erwan Guichoux,Annabel J Porté,Yec'Han Laizet,Philippe Lejeune,Ludivine Lassois,Frédéric Gévaudant,Adline Delcamp,Olivier De Thier,Stéphanie Mariette,Samuel Quevauvillers,Arnaud Monty

doi:10.1002/ece3.2466

Abstract

The RADseq technology allows researchers to efficiently develop thousands of polymorphic loci across multiple individuals with little or no prior information on the genome. However, many questions remain about the biases inherent to this technology. Notably, sequence misalignments arising from paralogy may affect the development of single nucleotide polymorphism (SNP) markers and the estimation of genetic diversity. We evaluated the impact of putative paralog loci on genetic diversity estimation during the development of SNPs from a RADseq dataset for the nonmodel tree species Robinia pseudoacacia L. We sequenced nine genotypes and analyzed the frequency of putative paralogous RAD loci as a function of both the depth of coverage and the mismatch threshold allowed between loci. Putative paralogy was detected in a very variable number of loci, from 1% to more than 20%, with the depth of coverage having a major influence on the result. Putative paralogy artificially increased the observed degree of polymorphism and resulting estimates of diversity. The choice of the depth of coverage also affected diversity estimation and SNP validation: A low threshold decreased the chances of detecting minor alleles while a high threshold increased allelic dropout. SNP validation was better for the low threshold (4×) than for the high threshold (18×) we tested. Using the strategy developed here, we were able to validate more than 80% of the SNPs tested by means of individual genotyping, resulting in a readily usable set of 330 SNPs, suitable for use in population genetics applications.

Highlights

With the extensive development of next-generation sequencing (NGS) technologies and the accurate bioinformatics treatment of data, it is feasible to obtain genomic data and develop single nucleotide polymorphism (SNP) markers for nonmodel species (Etter et al, 2011)
Consistent with the results reported above, putative paralogy directly influenced the level of polymorphism measured at the sequence level: RAD loci identified as paralogous were more polymorphic than nonparalogous loci (Table 1)
RADseq technology is increasingly used in population genetics studies because it provides a rapid and cheap means for developing thousands of polymorphic SNP loci, almost regardless of genome size and previous genomic knowledge (Mastretta-Yanes et al, 2015)

Summary

| INTRODUCTION

With the extensive development of next-generation sequencing (NGS) technologies and the accurate bioinformatics treatment of data, it is feasible to obtain genomic data and develop single nucleotide polymorphism (SNP) markers for nonmodel species (Etter et al, 2011). Developed methods for the detection of paralogy in NGS data are based on the elimination of RAD loci containing too many SNPs or deviating from Hardy–Weinberg equilibrium (Lexer et al, 2014), the elimination of RAD loci with a too high coverage (Bianco et al, 2014), or on tests for the existence of two loci at each given position, as implemented in the paralogy filtering option of the reads2snp program (Gayral et al, 2013) These methods help to increase the efficiency of de novo assemblies of short reads and the detection of sequencing misalignments, resulting in more accurate SNP detection. We added a validation step through genotyping to estimate the efficacy of the data cleaning with this approach

| MATERIALS AND METHODS

| DISCUSSION

Findings

| CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Ecology and Evolution	Publication Date: Sep 22, 2016
Citations: 49	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Dealing with paralogy in RADseq data: in silico detection and single nucleotide polymorphism validation in Robinia pseudoacacia L.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Ecology and Evolution

Lead the way for us

Similar Papers

Development of single nucleotide polymorphism (SNP) markers for use in commercial maize (Zea mays L.) germplasm
Elizabeth Jones ... Sutirtha Basu
Molecular Breeding | VOL. 24
Elizabeth Jones, et. al.Elizabeth Jones ... Sutirtha Basu
22 Apr 2009
Molecular Breeding | VOL. 24

Discovery and validation of genic single nucleotide polymorphisms in the Pacific oyster Crassostrea gigas.
Jiafeng Wang ... Guofan Zhang
Molecular ecology resources | VOL. 15
Jiafeng Wang, et. al.Jiafeng Wang ... Guofan Zhang
11 Jun 2014
Molecular ecology resources | VOL. 15

SNP-Discovery by RAD-Sequencing in a Germplasm Collection of Wild and Cultivated Grapevines (V. vinifera L.).
Annarita Marrano ... Giorgio Valle
PLOS ONE | VOL. 12
Annarita Marrano, et. al.Annarita Marrano ... Giorgio Valle
26 Jan 2017
PLOS ONE | VOL. 12

LIG1 polymorphisms: the Indian scenario.
Amit Kumar Mitra ... Ashok Singh
Journal of Genetics | VOL. 93
Amit Kumar Mitra, et. al.Amit Kumar Mitra ... Ashok Singh
01 Aug 2014
Journal of Genetics | VOL. 93

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dealing with paralogy in RADseq data: in silico detection and single nucleotide polymorphism validation in Robinia pseudoacacia L.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Ecology and Evolution