Abstract

Whole-genome duplications have occurred in the recent ancestors of many plants, fish and amphibians. Signals of these whole-genome duplications still exist in the form of paralogous loci. Recent advances have allowed reliable identification of paralogs in genotyping-by-sequencing (GBS) data such as that generated from restriction-site-associated DNA sequencing (RADSeq); however, excluding paralogs from analyses is still routine due to difficulties in genotyping. This exclusion of paralogs may filter a large fraction of loci, including loci that may be adaptively important or informative for population genetic analyses. We present a maximum-likelihood method for inferring allele dosage in paralogs and assess its accuracy using simulated GBS, empirical RADSeq and amplicon sequencing data from Chinook salmon. We accurately infer allele dosage for some paralogs from a RADSeq data set and show how accuracy is dependent upon both read depth and allele frequency. The amplicon sequencing data set, using RADSeq-derived markers, achieved sufficient depth to infer allele dosage for all paralogs. This study demonstrates that RADSeq locus discovery combined with amplicon sequencing of targeted loci is an effective method for incorporating paralogs into population genetic analyses.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call