Abstract

In landscape genetics, model selection procedures based on Information Theoretic and Bayesian principles have been used with multiple regression on distance matrices (MRM) to test the relationship between multiple vectors of pairwise genetic, geographic, and environmental distance. Using Monte Carlo simulations, we examined the ability of model selection criteria based on Akaike’s information criterion (AIC), its small-sample correction (AICc), and the Bayesian information criterion (BIC) to reliably rank candidate models when applied with MRM while varying the sample size. The results showed a serious problem: all three criteria exhibit a systematic bias toward selecting unnecessarily complex models containing spurious random variables and erroneously suggest a high level of support for the incorrectly ranked best model. These problems effectively increased with increasing sample size. The failure of AIC, AICc, and BIC was likely driven by the inflated sample size and different sum-of-squares partitioned by MRM, and the resulting effect on delta values. Based on these findings, we strongly discourage the continued application of AIC, AICc, and BIC for model selection with MRM.

Highlights

  • A primary goal of landscape genetics is to determine the relative influence of landscape composition, configuration, and matrix quality on patterns of gene flow, genetic discontinuities and population genetic structure [1,2,3,4,5]

  • For each set of simulations, we determined the reliability with which model selection algorithms based on Akaike’s information criterion (AIC), AICc, and Bayesian information criterion (BIC) were able to identify the correct model when applied with multiple regression with distance matrices (MRM) by the proportion of 1000 replicate data sets where we identified the correct model as the best model

  • The results from a typical single simulation run with n = 100 (Fig 1) illustrate how the behavior of AIC changed markedly when used with MRM on distance transformed data

Read more

Summary

Introduction

A primary goal of landscape genetics is to determine the relative influence of landscape composition (e.g., amount of habitat), configuration (spatial arrangement of habitat patches), and matrix quality (landscape between habitat patches) on patterns of gene flow, genetic discontinuities and population genetic structure [1,2,3,4,5]. Gene flow may be restricted by geographic distance (isolation-by-distance) and by resistance of land-cover types to movement (isolation-by-resistance). Model selection with multiple regression on distance matrices leads to incorrect inferences the conditions within patches (sampling locations), hypotheses are expressed in terms of pairwise distances between patches [6]. While the genetic data are collected within patches, genetic differentiation resulting from restricted gene flow is quantified in terms of pairwise genetic distances. Hypotheses concerning the association of pairwise distances between sampling units (i.e., genetic, geographic, environmental, or temporal distances) are often analyzed using Mantel tests [7] or its derivatives, such as partial Mantel test [8] and multiple regression with distance matrices (MRM) ([9,10,11], for examples see [12,13,14]). Various model selection approaches have been proposed for identifying the model that best explains the observed spatial genetic structure and assessing the level of support for each competing hypothesis [8,16,17,18,19,20,21,22,23], but the accuracy and reliability of these approaches remain a topic of considerable debate in the context of spatial analysis (e.g., [24,25])

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call