Abstract

Inference of genetic clusters is a key aim of population genetics, sparking development of numerous analytical methods. Within these, there is a conceptual divide between finding de novo structure versus assessment of a priori groups. Recently developed, Discriminant Analysis of Principal Components (DAPC), combines discriminant analysis (DA) with principal component (PC) analysis. When applying DAPC, the groups used in the DA (specified a priori or described de novo) need to be carefully assessed. While DAPC has rapidly become a core technique, the sensitivity of the method to misspecification of groups and how it is being empirically applied, are unknown. To address this, we conducted a simulation study examining the influence of a priori versus de novo group designations, and a literature review of how DAPC is being applied. We found that with a priori groupings, distance between genetic clusters reflected underlying FST. However, when migration rates were high and groups were described de novo there was considerable inaccuracy, both in terms of the number of genetic clusters suggested and placement of individuals into those clusters. Nearly all (90.1%) of 224 studies surveyed used DAPC to find de novo clusters, and for the majority (62.5%) the stated goal matched the results. However, most studies (52.3%) omit key run parameters, preventing repeatability and transparency. Therefore, we present recommendations for standard reporting of parameters used in DAPC analyses. The influence of groupings in genetic clustering is not unique to DAPC, and researchers need to consider their goal and which methods will be most appropriate.

Highlights

  • Inference of genetic clusters and knowledge of their divergence and distribution are important for many aspects in evolutionary biology and population genetics including studies of speciation (Sousa and Hey 2013), inferring disease spread risk (Hampton et al 2004; Cassirer et al 2018), as well as applications in conservation and forensics (Funk et al 2012; Coates et al 2018)

  • Focusing on studies where the goal was finding de novo structure, we examined the percentage of studies published each year for the following metrics: (1) authors stated they searched for the optimal number of genetic clusters in their data, (2) authors stated the method used to determine the optimal number of principal component (PC) to retain, and (3) authors stated the final number of PCs used in the discriminant analysis (DA)

  • The distances between Discriminant Analysis of Principal Components (DAPC) clusters decreased with increasing migration rate and were positively associated with FST between groups (Fig. 1a)

Read more

Summary

Introduction

Inference of genetic clusters and knowledge of their divergence and distribution are important for many aspects in evolutionary biology and population genetics including studies of speciation (Sousa and Hey 2013), inferring disease spread risk (Hampton et al 2004; Cassirer et al 2018), as well as applications in conservation and forensics (Funk et al 2012; Coates et al 2018). Many methods have been developed for determining genetic clusters and quantifying divergence among them These range from admixture and Bayesian clustering. Within all of these methods, there is a conceptual divide between assessing a priori (predefined) populations, versus finding clusters de novo. The former can help visualize differentiation between hypothesized groups or jurisdictions, while the latter is a test for population structure in a dataset. Both are valid questions; misspecification of groups can have serious consequences, especially for species of conservation concern. Misspecification may lead to over de novo a priori

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call