Abstract

BackgroundThere are several situations in population biology research where simulating DNA sequences is useful. Simulation of biological populations under different evolutionary genetic models can be undertaken using backward or forward strategies. Backward simulations, also called coalescent-based simulations, are computationally efficient. The reason is that they are based on the history of lineages with surviving offspring in the current population. On the contrary, forward simulations are less efficient because the entire population is simulated from past to present. However, the coalescent framework imposes some limitations that forward simulation does not. Hence, there is an increasing interest in forward population genetic simulation and efficient new tools have been developed recently. Software tools that allow efficient simulation of large DNA fragments under complex evolutionary models will be very helpful when trying to better understand the trace left on the DNA by the different interacting evolutionary forces. Here I will introduce GenomePop, a forward simulation program that fulfills the above requirements. The use of the program is demonstrated by studying the impact of intracodon recombination on global and site-specific dN/dS estimation.ResultsI have developed algorithms and written software to efficiently simulate, forward in time, different Markovian nucleotide or codon models of DNA mutation. Such models can be combined with recombination, at inter and intra codon levels, fitness-based selection and complex demographic scenarios.ConclusionGenomePop has many interesting characteristics for simulating SNPs or DNA sequences under complex evolutionary and demographic models. These features make it unique with respect to other simulation tools. Namely, the possibility of forward simulation under General Time Reversible (GTR) mutation or GTR×MG94 codon models with intra-codon recombination, arbitrary, user-defined, migration patterns, diploid or haploid models, constant or variable population sizes, etc. It also allows simulation of fitness-based selection under different distributions of mutational effects. Under the 2-allele model it allows the simulation of recombination hot-spots, the definition of different frequencies in different populations, etc. GenomePop can also manage large DNA fragments. In addition, it has a scaling option to save computation time when simulating large sequences and population sizes under complex demographic and evolutionary situations. These and many other features are detailed in its web page [1].

Highlights

  • There are several situations in population biology research where simulating DNA sequences is useful

  • Knowledge about patterns of linkage disequilibrium (LD) in humans is very important from a genomic point of view

  • The existence of linkage or haplotype blocks [11] or, at least, networks of SNPs in high LD [12], will facilitate the assembly of human genome haplotype maps [13,14,15] that will enormously improve, among other things, the efficiency of disease gene mapping. It seems that these blocks are mainly defined by recombination hot spots [16,17], but haplotype blocks can be generated by genetic drift in regions of uniform recombination if rates is low enough [18]

Read more

Summary

Results

Input file The input file should be called GenomePopInput.txt. In this file, lines beginning with '#' are comments and will be ignored. Strong effort has been made to validate the program as thoroughly as possible Both unscaled and scaled simulations were performed under a Jukes-Cantor model with diversity θ = 4Nμ = 0.004 over 104 generations and θ was estimated using the finitesites correction of Watterson θ [60]. Recombination was tested by evolving datasets for 6N generations under a Jukes-Cantor 4-allele model with different values for the parameter ρ = 4NrL, where N is population size, r is recombination rate per site and L is the DNA sequence length (the corresponding parameter in GenomePop is 'Rec' = r × L). Noteworthy is that recombination had no impact on global dN/dS estimation but had important effects on the number of sites detected under positive selection as is evident upon inspecting Table 2. Replicates and cases, which is out of the scope of the present work

Conclusion
Background
Carvajal-Rodríguez A
13. International-HapMap-Consortium
23. Kruglyak L
32. Wilkinson-Herbots HM
35. Tajima F
43. Fearnhead P: Perfect simulation from nonneutral population genetic models
50. Balloux F
55. Swofford DL

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.