Abstract

BackgroundFREGENE simulates sequence-level data over large genomic regions in large populations. Because, unlike coalescent simulators, it works forwards through time, it allows complex scenarios of selection, demography, and recombination to be modelled simultaneously. Detailed tracking of sites under selection is implemented in FREGENE and provides the opportunity to test theoretical predictions and gain new insights into mechanisms of selection. We describe here main functionalities of both FREGENE and SAMPLE, a companion program that can replicate association study datasets.ResultsWe report detailed analyses of six large simulated datasets that we have made publicly available. Three demographic scenarios are modelled: one panmictic, one substructured with migration, and one complex scenario that mimics the principle features of genetic variation in major worldwide human populations. For each scenario there is one neutral simulation, and one with a complex pattern of selection.ConclusionFREGENE and the simulated datasets will be valuable for assessing the validity of models for selection, demography and population genetic parameters, as well as the efficacy of association studies. Its principle advantages are modelling flexibility and computational efficiency. It is open source and object-oriented. As such, it can be customised and the range of models extended.

Highlights

  • FREGENE simulates sequence-level data over large genomic regions in large populations

  • These datasets may serve as useful standards for testing methods to infer population genetic parameters, such as recombination rates and selection coefficients, and together with SAMPLE can be used to assess genetic association methods

  • SAMPLE can simulate ascertainment bias, from a FREGENE subpopulation simulation, by allowing the user to specify the numbers of cases and controls from each subpopulation

Read more

Summary

Results

'Ready to use' simulated data sets Two standard population models, each with 10.5 K individuals, have been simulated over 20 Mb genomes, and the final generations are available as test datasets: population A is panmictic, while population B is subdivided into three subpopulations each of 3.5 K individuals. A third and more complex simulation (population C), over a 10 Mb genomic region, uses parameter values found by [5] to provide the neutral model that best fits the major features of current worldwide human genetic variation This simulation used a per-site and per generation mutation rate of 1.5 × 10-5 and required seven steps (note that all population expansions are instantaneous):. Mean diversity over the final 50 k generations (number of polymorphic sites), for populations A and B under the neutral and the selection scenarios. The impact of selection with or without subpopulation structure is summarized, which represents the time selected sites remain polymorphic as a function of the selection coefficient s As discussed above, this time is Population A (panmictic). Lines indicate the life-spans of sites under selection that reached fixation for the derived allele in populations A (top) and B (bottom).

Conclusion
Background
Weir BS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.