Abstract

Powerful approaches to inferring recent or current population structure based on nearest neighbor haplotype “coancestry” have so far been inaccessible to users without high quality genome-wide haplotype data. With a boom in nonmodel organism genomics, there is a pressing need to bring these methods to communities without access to such data. Here, we present RADpainter, a new program designed to infer the coancestry matrix from restriction-site-associated DNA sequencing (RADseq) data. We combine this program together with a previously published MCMC clustering algorithm into fineRADstructure—a complete, easy to use, and fast population inference package for RADseq data (https://github.com/millanek/fineRADstructure; last accessed February 24, 2018). Finally, with two example data sets, we illustrate its use, benefits, and robustness to missing RAD alleles in double digest RAD sequencing.

Highlights

  • Understanding of shared ancestry in genetic datasets is often key to their interpretation

  • The complicated network of relationships among these twelve populations belonging to two phylogenetically intertwined species (H. pusillum: P, H. veselskyi: V), with contrasting ecology and a post-glacial history of divergence in some of the six sampled localities (A to F; Figure 2), make it an excellent case to study the performance of our approach

  • Library subspecies 1 subspecies 2 hybrids. In this manuscript, we have described software that enables fine population structure inference based on nearest neighbour relationship between haplotypes inferred from RAD-seq data

Read more

Summary

Main Text

Introduction Understanding of shared ancestry in genetic datasets is often key to their interpretation. The high resolution of chromoPainter/fineSTRUCTURE and related methods derives from utilizing haplotype linkage information and from focusing on the most recent coalescence (common ancestry) among the sampled individuals This approach derives a ‘coancestry matrix’, a summary of nearest neighbour haplotype relationships in the dataset; i.e. of the cases where pairs of individuals had the most similar haplotypes one to another. The existing pipeline for coancestry matrix inference was designed for large scale human genetic SNP datasets, where chromosomal location of the markers are known, haplotypes are typically assumed to be correctly phased ( it is possible to perform the analysis without this assumption), and missing data needs to have been imputed These methods have so far been generally inaccessible for investigations beyond model organisms. We expect these features to be of particular use to the plant research community

C TT T TT
Results
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call