A fast likelihood solution to the genetic clustering problem.

Marie-Pauline Beugin,Dominique Pontier,Thibault Gayet,Sébastien Devillard,Thibaut Jombart,Thomas Hansen

doi:10.1111/2041-210x.12968

Abstract

The investigation of genetic clusters in natural populations is an ubiquitous problem in a range of fields relying on the analysis of genetic data, such as molecular ecology, conservation biology and microbiology. Typically, genetic clusters are defined as distinct panmictic populations, or parental groups in the context of hybridisation. Two types of methods have been developed for identifying such clusters: model‐based methods, which are usually computer‐intensive but yield results which can be interpreted in the light of an explicit population genetic model, and geometric approaches, which are less interpretable but remarkably faster.Here, we introduce snapclust, a fast maximum‐likelihood solution to the genetic clustering problem, which allies the advantages of both model‐based and geometric approaches. Our method relies on maximising the likelihood of a fixed number of panmictic populations, using a combination of geometric approach and fast likelihood optimisation, using the Expectation‐Maximisation (EM) algorithm. It can be used for assigning genotypes to populations and optionally identify various types of hybrids between two parental populations. Several goodness‐of‐fit statistics can also be used to guide the choice of the retained number of clusters.Using extensive simulations, we show that snapclust performs comparably to current gold standards for genetic clustering as well as hybrid detection, with some advantages for identifying hybrids after several backcrosses, while being orders of magnitude faster than other model‐based methods. We also illustrate how snapclust can be used for identifying the optimal number of clusters, and subsequently assign individuals to various hybrid classes simulated from an empirical microsatellite dataset. snapclust is implemented in the package adegenet for the free software R, and is therefore easily integrated into existing pipelines for genetic data analysis. It can be applied to any kind of co‐dominant markers, and can easily be extended to more complex models including, for instance, varying ploidy levels. Given its flexibility and computer‐efficiency, it provides a useful complement to the existing toolbox for the study of genetic diversity in natural populations.

Highlights

The identification of groups of genetically related individuals within a population, sensu population subdivision, is an ubiquitous problem in most fields in which genetic data analysis plays an important role including molecular ecology, evolutionary and conservation genetics
We illustrate how snapclust can be used for identifying the optimal number of clusters, and subsequently assign individuals to various hybrid classes simulated from an empirical microsatellite dataset. 4. snapclust is implemented in the package adegenet for the free software R, and is integrated into existing pipelines for genetic data analysis
The likelihood is defined as the probability that the set of genotypes under consideration was generated under a given population structure and model of evolution

Summary

| INTRODUCTION

The identification of groups of genetically related individuals within a population, sensu population subdivision, is an ubiquitous problem in most fields in which genetic data analysis plays an important role including molecular ecology, evolutionary and conservation genetics. The main limitation of geometric approaches lies in the fact that their results are harder to interpret biologically These methods typically identify clusters from pairwise genetic distances, without providing group membership probabilities (Jombart et al, 2010; Legendre & Legendre, 2012), so that weak separation between clusters or admixture patterns cannot be distinguished from strong, clear-cut population structure. To some extent, this issue can be addressed, using exploratory approaches such as the DAPC (Jombart et al, 2010), to visualise cluster diversity in a reduced space and even estimate group assignment probabilities, but these probabilities merely reflect genetic proximities, and cannot be interpreted as probabilities that an individual belongs to a given population. Snapclust is implemented in the package adegenet (Jombart, 2008; Jombart & Ahmed, 2011) for the R software (R Core Team 2017), being readily compatible with a wealth of tools for genetic data analysis in R (Goudet, 2005; Jombart et al, 2017; Kamvar, Tabima, & Grünwald, 2014; Paradis, 2010; Popescu, Huber, & Paradis, 2012)

| MATERIALS AND METHODS

| Optimisation procedure

Findings

| DISCUSSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Methods in Ecology and Evolution	Publication Date: Jan 30, 2018
Citations: 106	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A fast likelihood solution to the genetic clustering problem.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Methods in Ecology and Evolution

Lead the way for us

Similar Papers

Genetic variation in natural Melandrium album populations exposed to chronic ionizing radiation.
Elina Karimullina ... Vera N Pozolotina
Environmental science and pollution research international | VOL. 23
Elina Karimullina, et. al.Elina Karimullina ... Vera N Pozolotina
12 Aug 2016
Environmental science and pollution research international | VOL. 23

Genetic Response of Forest Systems to Changing Environmental Conditions
Gerhard Müller-Starck ... Roland Schubert
-
Gerhard Müller-Starck, et. al.Gerhard Müller-Starck ... Roland Schubert
01 Jan 2001
01 Jan 2001

Sexual conflict and the maintenance of genetic variation in natural populations.
Richard P Meisel
Molecular Ecology | VOL. 27
Richard P MeiselRichard P Meisel
01 Sep 2018
Molecular Ecology | VOL. 27

Conservation and the genetics of populations
...
Choice Reviews Online | VOL. 44
, et. al. ...
01 Feb 2007
Choice Reviews Online | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A fast likelihood solution to the genetic clustering problem.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Methods in Ecology and Evolution