The evolution of tandem repeat sequences under partial selfing and different modes of selection.
This study models the evolution of tandem repeat sequences in partially selfing diploid populations under various selective regimes, finding that selfing increases homozygosity, enhances within- and between-individual variation, and makes selection more effective, leading to lower genetic load despite increased genetic drift.
Tandem repeat (TR) sequences occur when short DNA motifs are repeated head-to-tail along chromosomes and are a major source of genetic variation. Population genetic models of TR evolution have focused on large, randomly mating, haploid populations. Yet many organisms reproduce partially through self-fertilisation ('selfing'), which increases homozygosity and thus may alter the evolutionary processes shaping TR sequences. Here we use mathematical modelling and simulations to study the evolution of homologous TR sequences in partially selfing, diploid populations under four different selective regimes that may be relevant to TRs: (i) additive purifying selection, (ii) truncation-like purifying selection, (iii) selection against heterozygotes due to misalignment costs, and (iv) stabilising selection favouring an intermediate TR sequence length. We show that selfing influences TR evolution primarily by increasing homozygosity, with two main consequences: (1) it enhances the variation produced by unequal recombination within individuals, and (2) it increases variation between individuals. Consequently, selection on TRs becomes more effective under partial selfing across all modes of selection considered, resulting in lower genetic load, despite higher genetic drift. Overall, our results suggest that mating systems and inbreeding are important factors shaping variation in TR sequences.
- Preprint Article
- 10.1101/2025.07.04.663195
- Jul 7, 2025
Tandem repeat sequences (TRs) occur when short DNA motifs are repeated head-to-tail along chromosomes and are a major source of genetic variation. Population genetics models of TR evolution have focused on large, randomly mating, haploid populations. Yet many organisms reproduce partially through self-fertilisation (“selfing”), which increases homozygosity and thus may alter the evolutionary processes shaping TRs. Here we use mathematical modelling and simulations to study the evolution of homologous TRs in partially selfing, diploid populations under four different selective regimes that may be relevant to TRs: (i) additive purifying selection, (ii) truncation-like purifying selection, (iii) selection against heterozygotes due to misalignment costs, and (iv) stabilising selection favouring an intermediate TR length. We show that selfing influences TR evolution primarily by increasing homozygosity, with two main consequences: (1) it enhances the variation produced by unequal recombination within individuals, and (2) it increases variation between individuals. Consequently, selection against TR expansions becomes more effective under partial selfing across all modes of selection considered, resulting in shorter TRs and lower genetic load, despite higher genetic drift. Overall, our results suggest that mating systems and inbreeding are important factors shaping variation in TRs.
- Research Article
2
- 10.1186/s12859-025-06168-3
- Jun 4, 2025
- BMC Bioinformatics
BackgroundTandem repeats (TRs) are major sources of genetic variation and important genetic markers. Their expansions are not only involved in gene expression regulation but also associated with many nervous system diseases and cancers. However, there is a lack of an efficient tandem repeat identification tool for seamless integration with larger bioinformatics programs developed with the popular Python language.ResultsWe introduce pytrf, a Python package for identification of both exact and approximate TRs from genomic sequences. It allows seamless embedding into other programs developed by Python or using in Python interactive environment and Jupyter notebooks. It also provides command line tools for assisting users to find tandem repeats from FASTA/Q files. Compared to other tools, the pytrf shows the highest performance in aspect of running time with comparable peak memory usage.ConclusionsPytrf provides simple interfaces and command line tools to facilitate identification of tandem repeats from genomic sequences. Pytrf can easily be installed from PyPI (https://pypi.org/project/pytrf) and the source code is freely available at https://github.com/lmdu/pytrf.
- Research Article
31
- 10.1534/g3.119.400239
- Jul 16, 2019
- G3: Genes|Genomes|Genetics
Partial selfing, whereby self- and cross- fertilization occur in populations at intermediate frequencies, is generally thought to be evolutionarily unstable. Yet, it is found in natural populations. This could be explained if populations with partial selfing are able to reduce genetic loads and the possibility for inbreeding depression while keeping genetic diversity that may be important for future adaptation. To address this hypothesis, we compare the experimental evolution of Caenorhabditis elegans populations under partial selfing, exclusive selfing or predominant outcrossing, while they adapt to osmotically challenging conditions. We find that the ancestral genetic load, as measured by the risk of extinction upon inbreeding by selfing, is maintained as long as outcrossing is the main reproductive mode, but becomes reduced otherwise. Analysis of genome-wide single-nucleotide polymorphisms (SNPs) during experimental evolution and among the inbred lines that survived enforced inbreeding indicates that populations with predominant outcrossing or partial selfing maintained more genetic diversity than expected with neutrality or purifying selection. We discuss the conditions under which this could be explained by the presence of recessive deleterious alleles and/or overdominant loci. Taken together, our observations suggest that populations evolving under partial selfing can gain some of the benefits of eliminating unlinked deleterious recessive alleles and also the benefits of maintaining genetic diversity at partially dominant or overdominant loci that become associated due to variance of inbreeding levels.
- Research Article
82
- 10.1002/ece3.609
- May 23, 2013
- Ecology and Evolution
Understanding factors regulating hybrid fitness and gene exchange is a major research challenge for evolutionary biology. Genomic cline analysis has been used to evaluate alternative patterns of introgression, but only two models have been used widely and the approach has generally lacked a hypothesis testing framework for distinguishing effects of selection and drift. I propose two alternative cline models, implement multivariate outlier detection to identify markers associated with hybrid fitness, and simulate hybrid zone dynamics to evaluate the signatures of different modes of selection. Analysis of simulated data shows that previous approaches are prone to false positives (multinomial regression) or relatively insensitive to outlier loci affected by selection (Barton's concordance). The new, theory-based logit-logistic cline model is generally best at detecting loci affecting hybrid fitness. Although some generalizations can be made about different modes of selection, there is no one-to-one correspondence between pattern and process. These new methods will enhance our ability to extract important information about the genetics of reproductive isolation and hybrid fitness. However, much remains to be done to relate statistical patterns to particular evolutionary processes. The methods described here are implemented in a freely available package “HIest” for the R statistical software (CRAN; http://cran.r-project.org/).
- Research Article
24
- 10.1105/tpc.109.067017
- Oct 30, 2009
- The Plant Cell
Large-scale comparison of sequence polymorphism and divergence at numerous genomic loci within and between closely related species can reveal signatures of natural selection. Here, we present a population genomics study based on direct sequencing of 61 mitotic cell cycle genes from 30 Arabidopsis thaliana accessions and comparison of the resulting data to the close relative Arabidopsis lyrata. We found that the Arabidopsis core cell cycle (CCC) machinery is not highly constrained but is subject to different modes of selection. We found patterns of purifying selection for the cyclin-dependent kinase (CDK), CDK subunit, retinoblastoma, and WEE1 gene families. Other CCC gene families often showed a mix of one or two constrained genes and relaxed purifying selection on the other genes. We found several large effect mutations in CDKB1;2 that segregate in the species. We found a strong signature of adaptive protein evolution in the Kip-related protein KRP6 and departures from equilibrium at CDKD;1 and CYCA3;3 consistent with the operation of selection in these gene regions. Our data suggest that within Arabidopsis, the genetic robustness of cell cycle-related processes is more due to functional redundancy than high selective constraint.
- Research Article
43
- 10.1111/j.1365-294x.2010.04643.x
- Apr 30, 2010
- Molecular Ecology
An excellent model to elucidate the mechanisms and importance of evolution in the marine environment is the spectral tuning mechanism of the visual pigment in vertebrates. In the sand goby Pomatoschistus minutus (Teleostei; Gobiidae), a distribution-wide study showed that spatial variation at the rhodopsin gene (RH1) matches the characteristics of specific light environments. This match suggests that populations are locally adapted to selective light regimes targeting the RH1 gene. If so, then the direction of selection should depend on the regional spatial and temporal stability of the light conditions. We tested this prediction by comparing goby populations from two regions: the Baltic Sea, characterized by divergent, but temporally stable light conditions, and the North Sea, characterized by locally heterogeneous and temporally variable light conditions. RH1 sequences of 491 Pomatoschistus minutus individuals from 15 locations were analysed. We found that variation at the RH1 gene in the Baltic populations showed signatures of diversifying selection, whereas the RH1 gene in the North Sea showed signatures of stabilizing selection. These different modes of selection are consistent with the regional light conditions and hence support our predictions, but may also be influenced by migration between the open sea and more turbid estuarine environments. An interesting observation is that within one gene, synonymous and non-synonymous SNPs show a totally different pattern between populations. Population differentiation based on non-synonymous SNPs of the RH1 gene correlated with spectral variation of the local environment of the sand goby populations. In contrast, the differentiation based on synonymous SNPs of RH1 reflects more the neutral historical pattern of the species.
- Research Article
75
- 10.1074/jbc.272.14.9517
- Apr 1, 1997
- Journal of Biological Chemistry
Tandem repeats are ubiquitous in nature and constitute a major source of genetic variability in populations. This variability is associated with a number of genetic disorders in humans including triplet expansion diseases such as Fragile X syndrome and Huntington's disease. The mechanism responsible for the variability/instability of these tandem arrays remains contentious. We show here that formation of secondary structures, in particular intrastrand tetraplexes, is an intrinsic property of some of the more unstable arrays. Tetraplexes block DNA polymerase progression and may promote instability of tandem arrays by increasing the likelihood of reiterative strand slippage. In the course of doing this work we have shown that some of these tetraplexes involve unusual base interactions. These interactions not only generate tetraplexes with novel properties but also lead us to conclude that the number of sequences that can form stable tetraplexes might be much larger than previously thought.
- Research Article
7
- 10.1007/s00251-022-01255-8
- Mar 1, 2022
- Immunogenetics
Duplicates of genes for major histocompatibility complex (MHC) molecules can be subjected to selection independently and vary markedly in their evolutionary rates, sequence polymorphism, and functional roles. Therefore, without a thorough understanding of their copy number variation (CNV) in the genome, the MHC-dependent fitness consequences within a species could be misinterpreted. Studying the intra-specific CNV of this highly polymorphic gene, however, has long been hindered by the difficulties in assigning alleles to loci and the lack of high-quality genomic data. Here, using the high-quality genome of the Siamese fighting fish (Betta splendens), a model for mate choice studies, and the whole-genome sequencing (WGS) data of 17 Betta species, we achieved locus-specific amplification of their three classical MHC class II genes - DAB1, DAB2, and DAB3. By performing quantitative PCR and depth-of-coverage analysis using the WGS data, we revealed intra-specific CNV at the DAB3 locus. We identified individuals that had two allelic copies (i.e., heterozygous or homozygous) or one allele (i.e., hemizygous) and individuals without this gene. The CNV was due to the deletion of a 20-kb-long genomic region harboring both the DAA3 and DAB3 genes. We further showed that the three DAB genes were under different modes of selection, which also applies to their corresponding DAA genes that share similar pattern of polymorphism. Our study demonstrates a combined approach to study CNV within a species, which is crucial for the understanding of multigene family evolution and the fitness consequences of CNV.
- Research Article
6
- 10.3390/genes8120351
- Nov 28, 2017
- Genes
The availability of the genome sequence of the unisexual (male-female) Caenorhabditis nigoni offers an opportunity to compare its non-coding features with the related hermaphroditic species Caenorhabditis briggsae; to understand the evolutionary dynamics of their tandem repeat sequences (satellites), as a result of evolution from the unisexual ancestor. We take advantage of the previously developed SATFIND program to build satellite families defined by a consensus sequence. The relative number of satellites (satellites/Mb) in C. nigoni is 24.6% larger than in C. briggsae. Some satellites in C. nigoni have developed from a proto-repeat present in the ancestor species and are conserved as an isolated sequence in C. briggsae. We also identify unique satellites which occur only once and joint satellite families with a related sequence in both species. Some of these families are only found in C. nigoni, which indicates a recent appearance; they contain conserved adjacent 5′ and 3′ regions, which may favor transposition. Our results show that the number, length and turnover of satellites are restricted in the hermaphrodite C. briggsae when compared with the unisexual C. nigoni. We hypothesize that this results from differences in unequal recombination during meiotic chromosome pairing, which limits satellite turnover in hermaphrodites.
- Research Article
148
- 10.3390/genes3030461
- Jul 26, 2012
- Genes
Copy Number Variations (CNVs) and Single Nucleotide Polymorphisms (SNPs) have been the major focus of most large-scale comparative genomics studies to date. Here, we discuss a third, largely ignored, type of genetic variation, namely changes in tandem repeat number. Historically, tandem repeats have been designated as non functional “junk” DNA, mostly as a result of their highly unstable nature. With the exception of tandem repeats involved in human neurodegenerative diseases, repeat variation was often believed to be neutral with no phenotypic consequences. Recent studies, however, have shown that as many as 10% to 20% of coding and regulatory sequences in eukaryotes contain an unstable repeat tract. Contrary to initial suggestions, tandem repeat variation can have useful phenotypic consequences. Examples include rapid variation in microbial cell surface, tuning of internal molecular clocks in flies and the dynamic morphological plasticity in mammals. As such, tandem repeats can be useful functional elements that facilitate evolvability and rapid adaptation.
- Research Article
21
- 10.3389/fpls.2020.607893
- Jan 12, 2021
- Frontiers in plant science
The unigeneric tribe Heliophileae encompassing more than 100 Heliophila species is morphologically the most diverse Brassicaceae lineage. The tribe is endemic to southern Africa, confined chiefly to the southwestern South Africa, home of two biodiversity hotspots (Cape Floristic Region and Succulent Karoo). The monospecific Chamira (C. circaeoides), the only crucifer species with persistent cotyledons, is traditionally retrieved as the closest relative of Heliophileae. Our transcriptome analysis revealed a whole-genome duplication (WGD) ∼26.15–29.20 million years ago, presumably preceding the Chamira/Heliophila split. The WGD was then followed by genome-wide diploidization, species radiations, and cladogenesis in Heliophila. The expanded phylogeny based on nuclear ribosomal DNA internal transcribed spacer (ITS) uncovered four major infrageneric clades (A–D) in Heliophila and corroborated the sister relationship between Chamira and Heliophila. Herein, we analyzed how the diploidization process impacted the evolution of repetitive sequences through low-coverage whole-genome sequencing of 15 Heliophila species, representing the four clades, and Chamira. Despite the firmly established infrageneric cladogenesis and different ecological life histories (four perennials vs. 11 annual species), repeatome analysis showed overall comparable evolution of genome sizes (288–484 Mb) and repeat content (25.04–38.90%) across Heliophila species and clades. Among Heliophila species, long terminal repeat (LTR) retrotransposons were the predominant components of the analyzed genomes (11.51–22.42%), whereas tandem repeats had lower abundances (1.03–12.10%). In Chamira, the tandem repeat content (17.92%, 16 diverse tandem repeats) equals the abundance of LTR retrotransposons (16.69%). Among the 108 tandem repeats identified in Heliophila, only 16 repeats were found to be shared among two or more species; no tandem repeats were shared by Chamira and Heliophila genomes. Six “relic” tandem repeats were shared between any two different Heliophila clades by a common descent. Four and six clade-specific repeats shared among clade A and C species, respectively, support the monophyly of these two clades. Three repeats shared by all clade A species corroborate the recent diversification of this clade revealed by plastome-based molecular dating. Phylogenetic analysis based on repeat sequence similarities separated the Heliophila species to three clades [A, C, and (B+D)], mirroring the post-polyploid cladogenesis in Heliophila inferred from rDNA ITS and plastome sequences.
- Research Article
6
- 10.1007/bf00355642
- Jun 1, 1995
- Mammalian Genome
A polymorphism of the variable number of tandem repeat (VNTR) type is located 97 bp downstream of exon VI of the parathyroid hormone-related peptide (PTHrP) gene in humans. The repeat unit has the general sequence G(TA)nC, where n equals 4-11. In order to characterize the evolutionary history of this VNTR, we initially tested for its presence in 13 different species representing four main groups of living primates. The sequence is present in the human, great apes, and Old World monkeys, but not in New World monkeys; and this region failed to PCR amplify in the Loris group. Thus, the evolution of the sequence as part of the PTHrP gene started at least 25-35 millions years ago, after divergence of the Old World and New World monkeys, but before divergence of Old World monkeys and great apes and humans. The structural changes occurring during evolution are characterized by a relatively high degree of sequence divergence. In general, the tandem repeat region tends to be longer and more complex in higher primates with the repeat unit motifs all being based on a TA-dinucleotide repeat sequence. Intra-species variability of the locus was demonstrated only in humans and gorilla. The divergence of the TA-dinucleotide repeat sequence and the variable mutation rates observed in different primate species are in contrast to the relative conservation of the flanking sequences during primate evolution. This suggests that the nature of the TA-dinucleotide repeat sequence, rather than its flanking sequences, is responsible for generating variability.(ABSTRACT TRUNCATED AT 250 WORDS)
- Research Article
112
- 10.1016/j.cub.2009.02.039
- Mar 12, 2009
- Current Biology
Reduced Effectiveness of Selection Caused by a Lack of Recombination
- Research Article
34
- 10.1098/rspb.2014.1984
- Apr 22, 2015
- Proceedings of the Royal Society B: Biological Sciences
Geographical heterogeneity in the composition of biotic interactions can create a mosaic of selection regimes that may drive the differentiation of phenotypes that operate at the interface of these interactions. Nonetheless, little is known about effects of these geographical mosaics on the evolution of genes encoding traits associated with species interactions. Predatory marine snails of the family Conidae use venom, a cocktail of conotoxins, to capture prey. We characterized patterns of geographical variation at five conotoxin genes of a vermivorous species, Conus ebraeus, at Hawaii, Guam and American Samoa, and evaluated how these patterns of variation are associated with geographical heterogeneity in prey utilization. All populations show distinct patterns of prey utilization. Three 'highly polymorphic' conotoxin genes showed significant geographical differences in allelic frequency, and appear to be affected by different modes of selection among populations. Two genes exhibited low levels of diversity and a general lack of differentiation among populations. Levels of diversity of 'highly polymorphic' genes exhibit a positive relationship with dietary breadth. The different patterns of evolution exhibited by conotoxin genes suggest that these genes play different roles in prey capture, and that some genes are more greatly affected by differences in predator-prey interactions than others. Moreover, differences in dietary breadth appear to have a greater influence on the differentiation of venoms than differences in the species of prey.
- Research Article
5
- 10.1186/s12864-023-09525-9
- Aug 11, 2023
- BMC Genomics
BackgroundThe 1RS arm of wheat-rye 1BL.1RS translocations contains several subtelomeric tandem repeat families. To study the effect of the difference in the composition of these tandem repeats on the meiotic recombination of 1RS arms can help to enrich the genetic diversity of 1BL.1RS translocation chromosomes.ResultsFive wheat-rye 1BL.1RS translocation cultivars/lines were used to build two cross combinations including group 1 (20T401 × Zhou 8425B, 20T401 × Lovrin 10 and 20T401 × Chuannong 17) and group 2 (20T360-2 × Zhou 8425B, 20T360-2 × Lovrin 10 and 20T360-2 × Chuannong 17). Oligonucleotide (oligo) probes Oligo-s120.3, Oligo-TR72, and Oligo-119.2-2 produced the same signal pattern on the 1RS arms in lines 20T401 and 20T360-2, and another signal pattern in the three cultivars Zhou 8425B, Lovrin 10 and Chuannong 17. The Oligo-pSc200 signal disappeared from the 1RS arms of the line 20T401, and the signal intensity of this probe on the 1RS arms of the line 20T360-2 was weaker than that of the three cultivars. The five cultivars/lines had the same signal pattern of the probe Oligo-pSc250. The recombination rate of 1RS arms in group 1 was significantly lower than that in group 2. In the progenies from group 1, unequal meiotic recombination in the subtelomeric pSc119.2 and pSc250 tandem repeat regions, and a 1BL.1RS with inversion of 1RS segment between the pSc200 and the nucleolar organizer region were found.ConclusionsThis study provides a visual tool to detect the meiotic recombination of 1RS arms. The meiotic recombination rate of 1RS arms was affected by the variation of pSc200 tandem repeat, indicating the similar composition of subtelomeric tandem repeats on these arms could increase their recombination rate. These results indicate that the 1RS subtelomeric structure will affect its recombination, and thus the localization of genes on 1RS by means of meiotic recombination might also be affected.