The geographic nature of biological dispersal shapes patterns of genetic variation over landscapes, making it possible to infer properties of dispersal from genetic variation data. Here, we present an inference tool that uses geographically distributed genotype data in combination with a convolutional neural network to estimate a critical population parameter: the mean per-generation dispersal distance. Using extensive simulation, we show that our deep learning approach is competitive with or outperforms state-of-the-art methods, particularly at small sample sizes. In addition, we evaluate varying nuisance parameters during training-including population density, demographic history, habitat size, and sampling area-and show that this strategy is effective for estimating dispersal distance when other model parameters are unknown. Whereas competing methods depend on information about local population density or accurate inference of identity-by-descent tracts, our method uses only single-nucleotide-polymorphism data and the spatial scale of sampling as input. Strikingly, and unlike other methods, our method does not use the geographic coordinates of the genotyped individuals. These features make our method, which we call "disperseNN," a potentially valuable new tool for estimating dispersal distance in nonmodel systems with whole genome data or reduced representation data. We apply disperseNN to 12 different species with publicly available data, yielding reasonable estimates for most species. Importantly, our method estimated consistently larger dispersal distances than mark-recapture calculations in the same species, which may be due to the limited geographic sampling area covered by some mark-recapture studies. Thus genetic tools like ours complement direct methods for improving our understanding of dispersal.
Read full abstract