A snapshot of human leukocyte antigen (HLA) diversity using data from the Allele Frequency Net Database
A snapshot of human leukocyte antigen (HLA) diversity using data from the Allele Frequency Net Database
- Research Article
5
- 10.1007/978-1-0716-3874-3_2
- Jan 1, 2024
- Methods in molecular biology (Clifton, N.J.)
The allele frequency net database (AFND, http://www.allelefrequencies.net ) is an online web-based repository that contains information on the frequencies of immune-related genes and their corresponding alleles in worldwide human populations. At present, the website contains data from 1784 population samples in more than 14 million individuals from 129 countries on the frequency of genes from different polymorphic regions including data for the human leukocyte antigen (HLA) system. In addition, over the last four years, AFND has also incorporated genotype raw data from 85,000 individuals comprising 215 population samples from 39 countries. Moreover, more population data sets containing next generation sequencing data spanning >3 million individuals have been added. This resource has been widely used in a variety of contexts such as histocompatibility, immunology, epidemiology, pharmacogenetics, epitope prediction algorithms for population coverage in vaccine development, population genetics, among many others. In this chapter, we present an update of the most used searching mechanisms as described in a previous volume and some of the latest developments included in AFND.
- Research Article
683
- 10.1093/nar/gkq1128
- Nov 9, 2010
- Nucleic Acids Research
The allele frequency net database (http://www.allelefrequencies.net) is an online repository that contains information on the frequencies of immune genes and their corresponding alleles in different populations. The extensive variability observed in genes and alleles related to the immune system response and its significance in transplantation, disease association studies and diversity in populations led to the development of this electronic resource. At present, the system contains data from 1133 populations in 608 813 individuals on the frequency of genes from different polymorphic regions such as human leukocyte antigens, killer-cell immunoglobulin-like receptors, major histocompatibility complex Class I chain-related genes and a number of cytokine gene polymorphisms. The project was designed to create a central source for the storage of frequency data and provide individuals with a set of bioinformatics tools to analyze the occurrence of these variants in worldwide populations. The resource has been used in a wide variety of contexts, including clinical applications (histocompatibility, immunology, epidemiology and pharmacogenetics) and population genetics. Demographic information, frequency data and searching tools can be freely accessed through the website.
- Book Chapter
52
- 10.1007/978-1-4939-8546-3_4
- Jan 1, 2018
The allele frequency net database (AFND, http://www.allelefrequencies.net ) is an online web-based repository that contains information on the frequencies of immune-related genes and their corresponding alleles in worldwide human populations. At present, the system contains data from 1505 populations in more than ten million individuals on the frequency of genes from different polymorphic regions including data for the human leukocyte antigens (HLA) system. This resource has been widely used in a variety of contexts such as histocompatibility, immunology, epidemiology, pharmacogenetics, and population genetics, among many others. In this chapter, we present some of the more commonly used searching mechanisms and some of the most recent developments included in AFND.
- Research Article
177
- 10.1086/501531
- Apr 1, 2006
- The American Journal of Human Genetics
Proportioning Whole-Genome Single-Nucleotide–Polymorphism Diversity for the Identification of Geographic Population Structure and Genetic Ancestry
- Research Article
27
- 10.1093/molehr/gaw062
- Sep 8, 2016
- Molecular Human Reproduction
Does mitochondrial DNA (mtDNA) diversity in modern human populations potentially pose a challenge, via mtDNA segregation, to mitochondrial replacement therapies? The magnitude of mtDNA diversity in modern human populations is as high as in mammalian model systems where strong mtDNA segregation is observed; consideration of haplotype pairs and/or haplotype matching can help avoid these potentially deleterious effects. In mammalian models, substantial proliferative differences are observed between different mtDNA haplotypes in cellular admixtures, with larger proliferative differences arising from more diverse haplotype pairings. If maternal mtDNA is 'carried over' in human gene therapies, these proliferative differences could lead to its amplification in the resulting offspring, potentially leading to manifestation of the disease that the therapy was designed to avoid-but existing studies have not investigated whether mtDNA diversity in modern human populations is sufficient to permit significant amplification. This theoretical study used over 7500 human mtDNA sequences from The National Center for Biotechnology Information (NCBI), a range of international and British mtDNA surveys, and 2011 census data. A stochastic simulation approach was used to model random haplotype pairings from within different regions. In total, 1000 simulated pairings were analysed using the basic local alignment search tool (BLAST) for each region. Previous data from mouse models were used to estimate proliferative differences. Even within the same haplogroup, differences of around 20-80 single-nucleotide polymorphisms (SNPs) are common between mtDNAs admixed in random pairings. These values are sufficient to lead to substantial segregation in mouse models over an organismal lifetime, even given low starting heteroplasmy, inducing increases from 5% to 35% over 1 year. Substantial population mixing in modern UK cities increases the expected genetic differences. Hence, the likely genetic differences between humans randomly sampled from a population may well allow substantial amplification of a disease-carrying mtDNA haplotype over the timescale of a human lifetime. We report ranges and mean differences for all statistics to quantify uncertainty in our results. The mapping from mouse and other mammalian models to the human system is challenging, as timescales and mechanisms may differ. Reporting biases in NCBI mtDNA data, if present, may affect the statistics we compute. We discuss the robustness of our findings in the light of these concerns. Matching the mtDNA haplotypes of the mother and third-party donor in mitochondrial replacement therapies is supported as a means of ameliorating the potentially deleterious results of human mtDNA diversity. We present a chart of expected SNP differences between mtDNA haplogroups, allowing the selection of optimal partners for therapies. N/A STUDY FUNDING/COMPETING INTERESTS: The authors report no external funding sources or conflicts of interest.
- Research Article
10
- 10.1111/tan.15043
- Apr 2, 2023
- HLA
HLA-B is among the most variable gene in the human genome. This gene encodes a key molecule for antigen presentation to CD8+ T lymphocytes and NK cell modulation. Despite the myriad of studies evaluating its coding region (with an emphasis on exons 2 and 3), few studies evaluated introns and regulatory sequences in real population samples. Thus, HLA-B variability is probably underestimated. We applied a bioinformatics pipeline tailored for HLA genes on 5347 samples from 80 different populations, which includes more than 1000 admixed Brazilians, to evaluate the HLA-B variability (SNPs, indels, MNPs, alleles, and haplotypes) in exons, introns, and regulatory regions. We observed 610 variable sites throughout HLA-B; the most frequent variants are shared worldwide. However, the haplotype distribution is geographically structured. We detected 920 full-length haplotypes (exons, introns, and untranslated regions) encoding 239 different protein sequences. HLA-B gene diversity is higher in admixed populations and Europeans while lower in African ancestry individuals. Each HLA-B allele group is associated with specific promoter sequences. This HLA-B variation resource may improve HLA imputation accuracy and disease-association studies and provide evolutionary insights regarding HLA-B genetic diversity in human populations.
- Research Article
- 10.1134/s1022795419010150
- Jan 1, 2019
- Russian Journal of Genetics
The Human Leukocyte Antigen (HLA) system represents a distinctive marker in identifying population groups since they exhibit a very high level of polymorphism that reveals remarkable frequency variation in human populations. In this study the gene frequencies of HLA class II DRB1, DQB1 and DQA1 alleles were studied in 100 randomly chosen individuals from region of Vojvodina, Serbia, with an aim to establish the genetic relationship between Vojvodina population and selected populations of European and non-European descent. DNA low resolution typing for DRB1, DQA1 and DQB1 was done using a standardized PCR-SSOP. Genetic distances between Vojvodina and 28 populations were computed based on frequency of DRB1 alleles. Phylogenetic trees were constructed using the neighbour-joining algorithm. The distribution of the observed genotypes is in Hardy–Weinberg equilibrium for all studied systems. The most frequent alleles were HLA-DQA1*01 (0.475), HLA-DQB1*05 (0.3), HLA-DRB1*11 (0.14). Two locus haplotype analysis identified HLA-DQB1*05-DQA1*01 (0.22), HLA-DQB1*03-DQA1*05 (0.135) and HLA-DQB1*06-DQA1*01 (0.1075) as the common haplotypes in Vojvodina population. Compared to other populations, Vojvodina appear to be genetically related to Balkan population, in particular Croatia, Slovenia and Bosnia and Herzegovinia. In conclusion, genetic distance matrix estimated using alleles at HLA-DRB1 locus, indicates that Vojvodina population is related to present-day nearby populations, Polish, Russians, Germans and French but not to Albanians from Kosovo and Bulgarians.
- Peer Review Report
- 10.7554/elife.81188.sa1
- Aug 22, 2022
Article Figures and data Abstract Editor's evaluation eLife digest Introduction Results Discussion Materials and methods Data availability References Decision letter Author response Article and author information Metrics Abstract Recent studies have suggested that the human germline mutation rate and spectrum evolve rapidly. Variation in generation time has been linked to these changes, though its contribution remains unclear. We develop a framework to characterize temporal changes in polymorphisms within and between populations, while controlling for the effects of natural selection and biased gene conversion. Application to the 1000 Genomes Project dataset reveals multiple independent changes that arose after the split of continental groups, including a previously reported, transient elevation in TCC>TTC mutations in Europeans and novel signals of divergence in C>Gand T>A mutation rates among population samples. We also find a significant difference between groups sampled in and outside of Africa in old T>C polymorphisms that predate the out-of-Africa migration. This surprising signal is driven by TpG>CpG mutations and stems in part from mis-polarized CpG transitions, which are more likely to undergo recurrent mutations. Finally, by relating the mutation spectrum of polymorphisms to parental age effects on de novo mutations, we show that plausible changes in the generation time cannot explain the patterns observed for different mutation types jointly. Thus, other factors – genetic modifiers or environmental exposures – must have had a non-negligible impact on the human mutation landscape. Editor's evaluation This important study investigates temporal variation in patterns of germline mutation during the evolution of human populations. Using a compelling approach that controls for the effects of selection and biased gene conversion the authors show that changes in generation time alone cannot explain the joint patterns observed for different mutation types, suggesting that other factors such as genetic modifiers or environmental exposures must have played a role as well. This work will be of broad interest to population geneticists and evolutionary biologists. https://doi.org/10.7554/eLife.81188.sa0 Decision letter Reviews on Sciety eLife's review process eLife digest Each human has 23 pairs of chromosomes, one set inherited from each parent. But the child's chromosomes are not an exact copy of their parents' chromosomes. Spontaneous changes or mutations in the DNA during the formation of the egg or sperm cells, or early development of the embryo, can change a small fraction of the nucleotides or 'letters' that make up the DNA. These modifications are an important source of genetic diversity in human populations and contribute to the evolution of new traits. Each genetic variant in present-day human populations represents a mutation in one of their ancestors. The types and frequencies of variants vary across human populations and have changed over time, suggesting that mutation patterns have evolved in the past. But the processes driving these population-level differences remain elusive. One possible factor may be changes in the average age of reproduction or the generation time in a population . For example, older parents contribute more – and also different types of – new mutations to their children than younger parents do. Populations, where it is customary to have children at older ages, may therefore have a different mutation landscape. To find out if this is indeed the case, Gao et al. used computer algorithms to analyze the genomes of hundreds of people living on three continents who participated in 'the 1,000 Genomes Project'. The analysis identified differences in mutation patterns across continental groups and estimated when these changes occurred. Further, they showed that although the age of reproduction had an impact on the mutation landscape, differences in generation time alone could not explain the observed changes in the human mutation spectrum. Factors other than generation time, such as environmental exposures, may have played a role in shifting these patterns. The study provides new insights into the changes in the mutation landscape over the course of human evolution. Mapping these patterns in humans worldwide may help scientists understand the causes underlying these changes. The techniques used by Gao et al. may also help analyze changes in mutation patterns in other organisms. Introduction Recent advances in high-throughput sequencing have enabled large-scale surveys of genetic variation in thousands of humans, providing a rich resource for understanding the source and mechanisms shaping the mutation landscape over time. Comparisons of polymorphism patterns across geographic population samples have uncovered numerous differences in the mutation rates and spectra (i.e., relative proportions of different types of mutations) (DeWitt et al., 2021; Goldberg and Harris, 2022; Harris, 2015; Harris and Pritchard, 2017; Hwang and Green, 2004; Mathieson and Reich, 2017; Moorjani et al., 2016a; Narasimhan et al., 2017; Speidel et al., 2019). A notable signal in humans is the enrichment of TCC>TTC variants in polymorphism data from Europeans relative to Africans and Asians (Harris and Pritchard, 2017). This signal is also observed in South Asians to a lesser degree and has been suggested to originate in ancient Neolithic farmers (Harris and Pritchard, 2017; Speidel et al., 2021). Many other subtle but statistically significant signals have also been detected; given the recent common ancestry of human populations, this finding indicates that the mutational spectrum in humans has been evolving rapidly. Several genetic and nongenetic factors have been implicated as affecting mutation rates and acting as potential drivers of observed interpopulation differences in the mutation spectrum of polymorphisms. First, some environmental exposures can increase mutation rates, especially of particular types. As humans in different geographic locations and environments may have experienced differential exposures over the past 50,000–100,000 years since the out-of-Africa (OOA) migration, rates of specific mutation types could have diverged between populations (Harris, 2015; Mathieson and Reich, 2017). Second, genetic modifiers of mutation rates, such as variants in genes that copy or repair DNA, could segregate at different frequencies across populations. Despite the deleterious effects of alleles that modify mutation rates, in recombining species, they could be nearly neutral and maintained for a long time, leading to genome-wide differences across populations (Milligan et al., 2022; Seoighe and Scally, 2017). In addition, direct sequencing of human pedigrees has revealed the effects of the parental ages at reproduction on the relative fractions of mutation types (Goldmann et al., 2018; Jónsson et al., 2017). For example, as parents age, fathers pass on disproportionally more T>C mutations, and mothers contribute a higher fraction of C>G mutations (Jónsson et al., 2017). Thus, differences in the average reproductive ages, or equivalently 'generation times,' alone can lead to differences in mutation spectrum across populations; indeed, such differences have been invoked to explain a large fraction of observed variation in types of polymorphisms among population samples (Coll Macià et al., 2021). The joint distribution of mutation type and frequency of polymorphisms, however, depends not only on the mutational input, but also on other evolutionary forces such as natural selection, biased gene conversion, and demography. In particular, natural selection distorts the allele frequency distribution and fixation probability of non-neutral variants, and the average effect of natural selection can differ across mutation types (Wakeley, 2010). As an example, genic regions tend to be more GC-rich, so mutations at G:C base pairs may be subject to stronger purifying or background selection compared to mutations at A:T base pairs (Lander et al., 2001; McVicker et al., 2009). GC-biased gene conversion (gBGC) is another process that exerts differential effects across mutation types by effectively acting like positive selection favoring mutations from weak alleles (A or T) to strong alleles (C or G) and negative selection against mutations from strong to weak alleles (Duret and Galtier, 2009). The strengths of selection and gBGC depend on the effective population size and thus on the demographic history of a population. Demographic history also influences allele frequencies for a given allele age (Kimura, 1969). This poses a challenge in interpreting previous studies (Harris and Pritchard, 2017; Mathieson and Reich, 2017) aimed at learning about when changes in mutational processes may have occurred by using allele frequencies, as mutations of the same frequency can have drastically different distributions of ages in distinct populations (e.g., doubletons in Africans are substantially older than doubletons in Europeans or Asians; Mathieson and McVean, 2014). Beyond the biological processes that shape polymorphism data, the characterization of the mutational spectrum can be biased by many technical issues. For instance, a recent study showed that some interpopulation differences discovered in low-coverage 1000 Genomes data may be driven by cell line artifacts or errors in PCR amplification (Anderson-Trocmé et al., 2020). Further, comparisons of mutation patterns across datasets are sensitive to differences in the accessible genomic regions across studies. Because there is large variation in mutation rates and base pair composition across genomic regions, differences in the regions sequenced across studies can have a non-negligible impact on comparisons of mutation spectrum across datasets (Monroe et al., 2022; Seplyarskiy et al., 2021). In addition, the number of genomes surveyed, in combination with the specific population demographic history, influences the chance of observing repeated mutations at the same site, and thus the observed polymorphism patterns (Lek et al., 2016). Given these challenges, it remains unclear whether the numerous observed differences across human populations stem from rapid evolution of the mutation process itself, other evolutionary processes, or technical factors. Motivated by these considerations, we propose a new framework to compare the mutation spectrum over time and across human populations. First, we infer the age of each derived allele observed in a population using a newly developed approach, Relate, which reconstructs local genealogies and estimates allele ages (Speidel et al., 2019). This approach allows us to perform more reliable comparisons across populations as well as to investigate changes in mutation processes across time. Next, we minimize confounding effects of selection by removing constrained regions and known targets of selection in the genome. We also control the effects of biased gene conversion by focusing on comparison of pairs of mutations (e.g., T>C and T>G) that are subject to similar effects of gBGC. This pairwise comparison further mitigates the issue of interdependencies in comparing mutation fractions (i.e., an increased contribution of one mutation type necessarily lowers the contribution of other mutation types). Based on this new framework, we re-evaluate the evidence for evolution of the mutation spectrum in human populations and investigate when, how, and in which populations significant changes have occurred over the course of human evolution. Finally, by relating parental age effects on the mutation spectrum estimated in contemporary pedigrees to the observed patterns of polymorphisms of varying ages, we evaluate the role of changes in generation times in shaping the human mutation landscape. Results Variation in the spectrum of human polymorphisms over time We analyzed single-nucleotide polymorphisms (SNP) identified in high-coverage whole-genome sequencing data from the 1000 Genomes Project, including 178 individuals of West African ancestry living in Ibadan, Nigeria (YRI), 179 individuals of Northern European ancestry living in the United States (CEU), and 103 individuals of East Asian ancestry living in Beijing, China (CHB) (Byrska-Bishop et al., 2022). To focus on putatively neutral mutations, we removed exons and phylogenetically conserved regions as previous studies (Harris and Pritchard, 2017; Moorjani et al., 2016a). To perform reliable comparison between datasets in downstream analysis and ensure the results are not driven by local genomic differences in mutation rate, we focused on regions that were accessible in both population and pedigree datasets ( hereafter, referred to as 'commonly accessible regions') ('Materials and Methods'). We inferred the age of each derived variant (with the ancestral allele determined based on the six primate EPO (Enredo, Pecan, Ortheus) alignment) in YRI, CEU, and CHB using Relate, a method to reconstruct local genealogies based on phased haplotype sequences (Speidel et al., 2019). We then divided all SNPs into 15 bins based on the ages of the derived allele inferred by Relate, accounting for uncertainty in the inferred mutation age by assuming a uniform distribution of ages between the inferred lower and upper bounds for each variant ('Materials and methods'). We classified each SNP into six disjoint classes based on the type of base pair substitution: T>A, T>C, T>G, C>A, C>G, and C>T (each including the corresponding substitution on the reverse complement strand, e.g., T>C includes both T>C and A>G substitutions). Given the well-characterized hypermutability of methylated CpG sites (Duncan and Miller, 1980; Kong et al., 2012), we further divided C>T SNPs into subtypes occurring in CpG and non-CpG contexts by considering the flanking base pair on either side of the variant. We find marked differences in the relative proportions of different mutation types (i.e., the mutation spectrum) across varying allele age bins within CEU (Figure 1) as well as in YRI and in CHB (Figure 1—figure supplement 1), as seen earlier in the low-coverage 1000G data (Speidel et al., 2019). We obtain qualitatively similar results when considering other 1000G populations of TSI, LWK, and JPT (Figure 1—figure supplement 1). This observation echoes previous findings about the evolution of the mutation spectrum comparing polymorphisms across allele frequencies (Carlson et al., 2018; Harris and Pritchard, 2017; Mathieson and Reich, 2017). As noted previously, however, differences in mutation spectrum across frequencies alone are weak evidence for the evolution of the mutation process itself because patterns of standing polymorphisms can be affected by repeat mutations and other evolutionary forces, including selection and gene conversion. Figure 1 with 8 supplements see all Download asset Open asset Changes in the mutation spectrum of polymorphisms in CEU over evolutionary time. Figure 1—source data 1 Bedfile for the commonly accessible region excluding exons and phylogenetically conserved elements. https://cdn.elifesciences.org/articles/81188/elife-81188-fig1-data1-v3.zip Download elife-81188-fig1-data1-v3.zip Figure 1—source data 2 Text files with (pseudo-)counts of different types of mutations in YRI, LWK, CEU, TSI, CHB, JPT in each time window. https://cdn.elifesciences.org/articles/81188/elife-81188-fig1-data2-v3.zip Download elife-81188-fig1-data2-v3.zip Notably, the infinite sites model is a reasonable assumption for small sample sizes (Kimura, 1969), but recurrent mutations become highly likely in large datasets, especially at sites with higher mutation rates (Harpak et al., 2016; Lek et al., 2016). Recurrent, multi-allelic, and back mutations violate the model assumptions of Relate and are often excluded from its output. For instance, given the higher mutation rate of transitions at CpG sites, such SNPs are more likely to be subject to recurrent mutations in a large sample and thus may map to multiple branches in the tree, leading to their exclusion from Relate's output (Speidel et al., 2019). As expected from these considerations, the fraction of CpG C>T SNPs in young mutations (i.e., those estimated to have occurred in the past ~50 generations) is lower than proportions in de novo mutations (DNMs) in present-day pedigree studies (Figure 1—figure supplement 2). Differences in mutation spectrum across age bins in modern humans persist even after excluding CpG C>T mutations (Figure 1—figure supplement 3), however, indicating that other mutation types are also changing in relative frequency over time and the observed patterns are not driven solely by recurrent mutation at CpG sites. Next, we examined the effect of linked selection on different mutation types. While we excluded direct targets of selection from analysis (i.e., exons and conserved regions), much of the genome is linked to non-neutral variants and subject, notably, to background selection (Charlesworth et al., 1993; McVicker et al., 2009; Murphy et al., 2022). A common measure of the effects of background selection is the B-statistic or B-score that estimates the reduction in nucleotide diversity levels compared to the neutral expectation (McVicker et al., 2009). To characterize the impact of linked selection, we calculated the average genome-wide B-score of each mutation type. We find nearly identical average B-scores and similar distributions for all mutation types (Figure 1—figure supplement 4). Further, comparing the mutation spectrum over time in CEU, YRI, and CHB, we obtain qualitatively similar results when restricting to regions with weak background selection (B-score > 800, where the genetic diversity is reduced by <20% compared to the neutral expectation; Figure 1—figure supplement 5). These analyses suggest that although linked selection has pervasive effects, its average impact is relatively uniform across the seven mutation types in commonly accessible regions ('Materials and methods). Gene conversion is another evolutionary process that can have a profound impact on the mutation spectrum of polymorphisms. gBGC acts like selection for certain mutation types by causing the preferential transmission of strong (S) alleles (C or G) over weak (W) alleles (A or T) in heterozygotes (Duret and Galtier, 2009). Accordingly, we observe enrichments of W>S mutations (T>C and T>G) in common variants and of S>W mutations (C>A and C>T) in rare variants (Figure 1—figure supplement 6A). Moreover, gBGC violates model assumptions of Relate (for both neutrality and infinite-sites mutation model) and could lead to subtle biases in estimated allele ages (Speidel et al., 2019). Due to the effect of gBGC, W>S mutations are expected to be enriched in older variants compared to S>W variants, and this enrichment is expected to be stronger in regions with high recombination rates (Glémin et al., 2015). Indeed, we observe such enrichment and the expected correlation with recombination rate (Figure 1—figure supplement 7A), supporting the effect of gBGC on the mutation spectrum of variants across mutation ages. Furthermore, the effect of gBGC is expected to vary across populations as its strength depends on the effective population size. Accordingly, we observe that the trends of the ratio of W>S to S>W over time differ across human populations (Figure 1—figure supplement 6B, Figure 1—figure supplement 7B). These results highlight the need to account for gBGC in order to reliably interpret the source of observed differences within and between populations (whether using allele frequency bins or allele age estimates). Pairwise comparisons of mutation types accounting for gBGC In light of the impact of gBGC on the mutation spectrum, we focused on comparisons of pairs of mutation types subject to similar effects of gBGC (i.e., in which both are favored, disfavored, or unaffected by gBGC). Specifically, we focused on four pairwise comparisons including (1) C>T at non-CpGs vs. C>A at non-CpGs; (2) C>T CpGs vs. C>A CpGs; (3) C>G vs. T>A; and (4) T>C vs. T>G. In principle, it is possible that the strength of gBGC is distinct for different types of variantsinvolving S and W alleles (Tsai-Wu et al., 1992). However, in mice, roughly similar conversion rates are observed for C>A and C>T non-crossover gene conversion events as well as for T>C and T>G events (Li et al., 2019), lending support to using pairwise mutation ratios for controlling the effects of gBGC at least to a first approximation. Three of the four pairwise comparisons involve mutation types with the same mutational opportunity (e.g., both T>C and T>G mutations involve changes at ancestral T bases in the genome), which further minimizes the confounding effects of regional variation on the chance of recurrent mutation or strength of background selection. Moreover, the pairwise ratios impose no co-dependency among mutation types as the four comparisons are mathematically independent of each other (although they may be biologically dependent if multiple ratios are affected simultaneously by some change in the mutational process). Investigating the mutation spectrum using these four pairwise comparisons, we observe marked differences in the ratios both over evolutionary time and across populations. Specifically, we find multiple independent signals of mutation rate evolution, reflected by both temporal variation within a population and differences between YRI, CHB, and CEU (p<0.01 by chi-square test after correcting for multiple hypothesis testing; 'Materials and methods'; Figure 2A). These differences may represent geographic or population differences as we these findings in other population samples from the same continents – LWK, TSI, and JPT – from the 1000 Genomes Project (Figure supplement 1). Figure 2 with supplements see all Download asset Open asset of pairwise mutation ratios for polymorphisms in different time pairwise mutation ratios are each of which mutation types that are for mutational opportunity and effects of GC-biased gene conversion The indicates the with the out-of-Africa (OOA) migration. The represent the observed polymorphism while the the assuming a distribution of polymorphism in are three ratios that show significant interpopulation with into each in lower to the from a chi-square test after a for and of levels were used in Figure in ratio in CEU at non-CpG sites, after excluding the four contexts and previously identified to be with the in Europeans by Harris and Pritchard, as well as contexts affected by of in mutational of and (Harris, 2015; Mathieson and Reich, 2017). divergence in ratio among three population ratio in YRI than CEU and CHB samples among old variants, driven by Figure data 1 Text files with (pseudo-)counts of mutations classified into types in genomic regions and within the C>G mutation in YRI, LWK, CEU, TSI, CHB, and JPT in each time window. Download We multiple to out technical artifacts or to the mutation process in to the observed interpopulation We obtain qualitatively similar results when restricting the analysis to putatively neutral regions with B-score > (Figure supplement or comparing regions with high and recombination rates (Figure supplement 3), that the of pairwise comparisons effectively controls for the effects of selection and gBGC. In to account for potential in mutation ages estimated by Relate, we variants by allele frequencies of inferred mutation ages and the signals in mutation age analysis (Figure supplement 4). We observe similar results with different for allele age, based on inferred mutation ages in YRI or CHB (Figure supplement 5). these results strong evidence that the human germline mutation spectrum has evolved over time and across populations. we the and population of the mutation rate changes to each of the signals we in of non-CpG ratio in Europeans The signal that we observed is the transient elevation in the ratio of mutations at non-CpG sites in CEU compared to the ancestral the in the non-CpG ratios of CHB and YRI not a similar at recent This signal the previously enrichment of C>T polymorphisms in a in as well as other contexts (Harris and Pritchard, 2017; Mathieson and Reich, 2017; Speidel et al., 2019). Investigating the temporal patterns in CEU, we find that the increase in the ratio of mutations at non-CpG sites from the time the years or and and in the recent age of (Figure 2A). Because there is large uncertainty in inferred allele ages and approach often the contribution of each variant over or more age bins ('Materials and the and of variation be the transient change in non-CpG C>T mutations likely and of higher than results However, the temporal and geographic enrichment patterns from analysis are with previous based on low-coverage or other datasets (Harris and Pritchard, 2017; Mathieson and Reich, 2017; Speidel et al., 2019). all non-CpG the interpopulation differences are in the four previously contexts and Harris and Pritchard, but are in other non-CpG contexts as well (Figure analysis that these mutational contexts are enriched in of the mutational from mutations in the of in and with exposures to light and et al., Harris, 2015; Mathieson and Reich, 2017). To test whether one of these mutational may be for the observed differences in polymorphism data, we the mutation ratio at non-CpG sites after excluding the contexts affected by or ('Materials and methods'). While we observe some reduction in the of non-CpG ratio in the interpopulation differences remain significant (Figure These results suggest the transient change in non-CpG ratio is not driven by the mutational mechanisms corresponding to either or Thus, the of this signal in Europeans remains unclear. of ratio among populations The interpopulation difference is in the ratio (Figure the among all three populations. In the past both YRI and CEU samples show an increase in the ratio of different while in the CHB, the ratio and then relatively for roughly (Figure 2A). the previous interpopulation differences in remain highly significant for the recent variants as well to factors the relative rates of C>G and T>A mutations at The fraction of C>G in de novo germline mutations is sensitive to parental ages, with the age at (Jónsson et al., 2017). This the that the interpopulation differences in ratio are driven by different average reproductive ages among populations (Coll Macià et al., 2021). To test this we the regional enrichment of C>G mutations – enriched – as of the genome with the C>G SNP that to of the age effect (i.e., the increase in with Jónsson et al., 2017). The ratio within the C>G enriched regions not show significant interpopulation differences (Figure reduced to the much lower SNP in these regions of all but see Figure supplement for of the C>G enriched regions, the three populations differ as much as they genome-wide (Figure indicating that the differential of C>G mutations with ages is not the of the differences
- Peer Review Report
- 10.7554/elife.81188.sa0
- Aug 22, 2022
Article Figures and data Abstract Editor's evaluation eLife digest Introduction Results Discussion Materials and methods Data availability References Decision letter Author response Article and author information Metrics Abstract Recent studies have suggested that the human germline mutation rate and spectrum evolve rapidly. Variation in generation time has been linked to these changes, though its contribution remains unclear. We develop a framework to characterize temporal changes in polymorphisms within and between populations, while controlling for the effects of natural selection and biased gene conversion. Application to the 1000 Genomes Project dataset reveals multiple independent changes that arose after the split of continental groups, including a previously reported, transient elevation in TCC>TTC mutations in Europeans and novel signals of divergence in C>Gand T>A mutation rates among population samples. We also find a significant difference between groups sampled in and outside of Africa in old T>C polymorphisms that predate the out-of-Africa migration. This surprising signal is driven by TpG>CpG mutations and stems in part from mis-polarized CpG transitions, which are more likely to undergo recurrent mutations. Finally, by relating the mutation spectrum of polymorphisms to parental age effects on de novo mutations, we show that plausible changes in the generation time cannot explain the patterns observed for different mutation types jointly. Thus, other factors – genetic modifiers or environmental exposures – must have had a non-negligible impact on the human mutation landscape. Editor's evaluation This important study investigates temporal variation in patterns of germline mutation during the evolution of human populations. Using a compelling approach that controls for the effects of selection and biased gene conversion the authors show that changes in generation time alone cannot explain the joint patterns observed for different mutation types, suggesting that other factors such as genetic modifiers or environmental exposures must have played a role as well. This work will be of broad interest to population geneticists and evolutionary biologists. https://doi.org/10.7554/eLife.81188.sa0 Decision letter Reviews on Sciety eLife's review process eLife digest Each human has 23 pairs of chromosomes, one set inherited from each parent. But the child's chromosomes are not an exact copy of their parents' chromosomes. Spontaneous changes or mutations in the DNA during the formation of the egg or sperm cells, or early development of the embryo, can change a small fraction of the nucleotides or 'letters' that make up the DNA. These modifications are an important source of genetic diversity in human populations and contribute to the evolution of new traits. Each genetic variant in present-day human populations represents a mutation in one of their ancestors. The types and frequencies of variants vary across human populations and have changed over time, suggesting that mutation patterns have evolved in the past. But the processes driving these population-level differences remain elusive. One possible factor may be changes in the average age of reproduction or the generation time in a population . For example, older parents contribute more – and also different types of – new mutations to their children than younger parents do. Populations, where it is customary to have children at older ages, may therefore have a different mutation landscape. To find out if this is indeed the case, Gao et al. used computer algorithms to analyze the genomes of hundreds of people living on three continents who participated in 'the 1,000 Genomes Project'. The analysis identified differences in mutation patterns across continental groups and estimated when these changes occurred. Further, they showed that although the age of reproduction had an impact on the mutation landscape, differences in generation time alone could not explain the observed changes in the human mutation spectrum. Factors other than generation time, such as environmental exposures, may have played a role in shifting these patterns. The study provides new insights into the changes in the mutation landscape over the course of human evolution. Mapping these patterns in humans worldwide may help scientists understand the causes underlying these changes. The techniques used by Gao et al. may also help analyze changes in mutation patterns in other organisms. Introduction Recent advances in high-throughput sequencing have enabled large-scale surveys of genetic variation in thousands of humans, providing a rich resource for understanding the source and mechanisms shaping the mutation landscape over time. Comparisons of polymorphism patterns across geographic population samples have uncovered numerous differences in the mutation rates and spectra (i.e., relative proportions of different types of mutations) (DeWitt et al., 2021; Goldberg and Harris, 2022; Harris, 2015; Harris and Pritchard, 2017; Hwang and Green, 2004; Mathieson and Reich, 2017; Moorjani et al., 2016a; Narasimhan et al., 2017; Speidel et al., 2019). A notable signal in humans is the enrichment of TCC>TTC variants in polymorphism data from Europeans relative to Africans and Asians (Harris and Pritchard, 2017). This signal is also observed in South Asians to a lesser degree and has been suggested to originate in ancient Neolithic farmers (Harris and Pritchard, 2017; Speidel et al., 2021). Many other subtle but statistically significant signals have also been detected; given the recent common ancestry of human populations, this finding indicates that the mutational spectrum in humans has been evolving rapidly. Several genetic and nongenetic factors have been implicated as affecting mutation rates and acting as potential drivers of observed interpopulation differences in the mutation spectrum of polymorphisms. First, some environmental exposures can increase mutation rates, especially of particular types. As humans in different geographic locations and environments may have experienced differential exposures over the past 50,000–100,000 years since the out-of-Africa (OOA) migration, rates of specific mutation types could have diverged between populations (Harris, 2015; Mathieson and Reich, 2017). Second, genetic modifiers of mutation rates, such as variants in genes that copy or repair DNA, could segregate at different frequencies across populations. Despite the deleterious effects of alleles that modify mutation rates, in recombining species, they could be nearly neutral and maintained for a long time, leading to genome-wide differences across populations (Milligan et al., 2022; Seoighe and Scally, 2017). In addition, direct sequencing of human pedigrees has revealed the effects of the parental ages at reproduction on the relative fractions of mutation types (Goldmann et al., 2018; Jónsson et al., 2017). For example, as parents age, fathers pass on disproportionally more T>C mutations, and mothers contribute a higher fraction of C>G mutations (Jónsson et al., 2017). Thus, differences in the average reproductive ages, or equivalently 'generation times,' alone can lead to differences in mutation spectrum across populations; indeed, such differences have been invoked to explain a large fraction of observed variation in types of polymorphisms among population samples (Coll Macià et al., 2021). The joint distribution of mutation type and frequency of polymorphisms, however, depends not only on the mutational input, but also on other evolutionary forces such as natural selection, biased gene conversion, and demography. In particular, natural selection distorts the allele frequency distribution and fixation probability of non-neutral variants, and the average effect of natural selection can differ across mutation types (Wakeley, 2010). As an example, genic regions tend to be more GC-rich, so mutations at G:C base pairs may be subject to stronger purifying or background selection compared to mutations at A:T base pairs (Lander et al., 2001; McVicker et al., 2009). GC-biased gene conversion (gBGC) is another process that exerts differential effects across mutation types by effectively acting like positive selection favoring mutations from weak alleles (A or T) to strong alleles (C or G) and negative selection against mutations from strong to weak alleles (Duret and Galtier, 2009). The strengths of selection and gBGC depend on the effective population size and thus on the demographic history of a population. Demographic history also influences allele frequencies for a given allele age (Kimura, 1969). This poses a challenge in interpreting previous studies (Harris and Pritchard, 2017; Mathieson and Reich, 2017) aimed at learning about when changes in mutational processes may have occurred by using allele frequencies, as mutations of the same frequency can have drastically different distributions of ages in distinct populations (e.g., doubletons in Africans are substantially older than doubletons in Europeans or Asians; Mathieson and McVean, 2014). Beyond the biological processes that shape polymorphism data, the characterization of the mutational spectrum can be biased by many technical issues. For instance, a recent study showed that some interpopulation differences discovered in low-coverage 1000 Genomes data may be driven by cell line artifacts or errors in PCR amplification (Anderson-Trocmé et al., 2020). Further, comparisons of mutation patterns across datasets are sensitive to differences in the accessible genomic regions across studies. Because there is large variation in mutation rates and base pair composition across genomic regions, differences in the regions sequenced across studies can have a non-negligible impact on comparisons of mutation spectrum across datasets (Monroe et al., 2022; Seplyarskiy et al., 2021). In addition, the number of genomes surveyed, in combination with the specific population demographic history, influences the chance of observing repeated mutations at the same site, and thus the observed polymorphism patterns (Lek et al., 2016). Given these challenges, it remains unclear whether the numerous observed differences across human populations stem from rapid evolution of the mutation process itself, other evolutionary processes, or technical factors. Motivated by these considerations, we propose a new framework to compare the mutation spectrum over time and across human populations. First, we infer the age of each derived allele observed in a population using a newly developed approach, Relate, which reconstructs local genealogies and estimates allele ages (Speidel et al., 2019). This approach allows us to perform more reliable comparisons across populations as well as to investigate changes in mutation processes across time. Next, we minimize confounding effects of selection by removing constrained regions and known targets of selection in the genome. We also control the effects of biased gene conversion by focusing on comparison of pairs of mutations (e.g., T>C and T>G) that are subject to similar effects of gBGC. This pairwise comparison further mitigates the issue of interdependencies in comparing mutation fractions (i.e., an increased contribution of one mutation type necessarily lowers the contribution of other mutation types). Based on this new framework, we re-evaluate the evidence for evolution of the mutation spectrum in human populations and investigate when, how, and in which populations significant changes have occurred over the course of human evolution. Finally, by relating parental age effects on the mutation spectrum estimated in contemporary pedigrees to the observed patterns of polymorphisms of varying ages, we evaluate the role of changes in generation times in shaping the human mutation landscape. Results Variation in the spectrum of human polymorphisms over time We analyzed single-nucleotide polymorphisms (SNP) identified in high-coverage whole-genome sequencing data from the 1000 Genomes Project, including 178 individuals of West African ancestry living in Ibadan, Nigeria (YRI), 179 individuals of Northern European ancestry living in the United States (CEU), and 103 individuals of East Asian ancestry living in Beijing, China (CHB) (Byrska-Bishop et al., 2022). To focus on putatively neutral mutations, we removed exons and phylogenetically conserved regions as previous studies (Harris and Pritchard, 2017; Moorjani et al., 2016a). To perform reliable comparison between datasets in downstream analysis and ensure the results are not driven by local genomic differences in mutation rate, we focused on regions that were accessible in both population and pedigree datasets ( hereafter, referred to as 'commonly accessible regions') ('Materials and Methods'). We inferred the age of each derived variant (with the ancestral allele determined based on the six primate EPO (Enredo, Pecan, Ortheus) alignment) in YRI, CEU, and CHB using Relate, a method to reconstruct local genealogies based on phased haplotype sequences (Speidel et al., 2019). We then divided all SNPs into 15 bins based on the ages of the derived allele inferred by Relate, accounting for uncertainty in the inferred mutation age by assuming a uniform distribution of ages between the inferred lower and upper bounds for each variant ('Materials and methods'). We classified each SNP into six disjoint classes based on the type of base pair substitution: T>A, T>C, T>G, C>A, C>G, and C>T (each including the corresponding substitution on the reverse complement strand, e.g., T>C includes both T>C and A>G substitutions). Given the well-characterized hypermutability of methylated CpG sites (Duncan and Miller, 1980; Kong et al., 2012), we further divided C>T SNPs into subtypes occurring in CpG and non-CpG contexts by considering the flanking base pair on either side of the variant. We find marked differences in the relative proportions of different mutation types (i.e., the mutation spectrum) across varying allele age bins within CEU (Figure 1) as well as in YRI and in CHB (Figure 1—figure supplement 1), as seen earlier in the low-coverage 1000G data (Speidel et al., 2019). We obtain qualitatively similar results when considering other 1000G populations of TSI, LWK, and JPT (Figure 1—figure supplement 1). This observation echoes previous findings about the evolution of the mutation spectrum comparing polymorphisms across allele frequencies (Carlson et al., 2018; Harris and Pritchard, 2017; Mathieson and Reich, 2017). As noted previously, however, differences in mutation spectrum across frequencies alone are weak evidence for the evolution of the mutation process itself because patterns of standing polymorphisms can be affected by repeat mutations and other evolutionary forces, including selection and gene conversion. Figure 1 with 8 supplements see all Download asset Open asset Changes in the mutation spectrum of polymorphisms in CEU over evolutionary time. Figure 1—source data 1 Bedfile for the commonly accessible region excluding exons and phylogenetically conserved elements. https://cdn.elifesciences.org/articles/81188/elife-81188-fig1-data1-v3.zip Download elife-81188-fig1-data1-v3.zip Figure 1—source data 2 Text files with (pseudo-)counts of different types of mutations in YRI, LWK, CEU, TSI, CHB, JPT in each time window. https://cdn.elifesciences.org/articles/81188/elife-81188-fig1-data2-v3.zip Download elife-81188-fig1-data2-v3.zip Notably, the infinite sites model is a reasonable assumption for small sample sizes (Kimura, 1969), but recurrent mutations become highly likely in large datasets, especially at sites with higher mutation rates (Harpak et al., 2016; Lek et al., 2016). Recurrent, multi-allelic, and back mutations violate the model assumptions of Relate and are often excluded from its output. For instance, given the higher mutation rate of transitions at CpG sites, such SNPs are more likely to be subject to recurrent mutations in a large sample and thus may map to multiple branches in the tree, leading to their exclusion from Relate's output (Speidel et al., 2019). As expected from these considerations, the fraction of CpG C>T SNPs in young mutations (i.e., those estimated to have occurred in the past ~50 generations) is lower than proportions in de novo mutations (DNMs) in present-day pedigree studies (Figure 1—figure supplement 2). Differences in mutation spectrum across age bins in modern humans persist even after excluding CpG C>T mutations (Figure 1—figure supplement 3), however, indicating that other mutation types are also changing in relative frequency over time and the observed patterns are not driven solely by recurrent mutation at CpG sites. Next, we examined the effect of linked selection on different mutation types. While we excluded direct targets of selection from analysis (i.e., exons and conserved regions), much of the genome is linked to non-neutral variants and subject, notably, to background selection (Charlesworth et al., 1993; McVicker et al., 2009; Murphy et al., 2022). A common measure of the effects of background selection is the B-statistic or B-score that estimates the reduction in nucleotide diversity levels compared to the neutral expectation (McVicker et al., 2009). To characterize the impact of linked selection, we calculated the average genome-wide B-score of each mutation type. We find nearly identical average B-scores and similar distributions for all mutation types (Figure 1—figure supplement 4). Further, comparing the mutation spectrum over time in CEU, YRI, and CHB, we obtain qualitatively similar results when restricting to regions with weak background selection (B-score > 800, where the genetic diversity is reduced by <20% compared to the neutral expectation; Figure 1—figure supplement 5). These analyses suggest that although linked selection has pervasive effects, its average impact is relatively uniform across the seven mutation types in commonly accessible regions ('Materials and methods). Gene conversion is another evolutionary process that can have a profound impact on the mutation spectrum of polymorphisms. gBGC acts like selection for certain mutation types by causing the preferential transmission of strong (S) alleles (C or G) over weak (W) alleles (A or T) in heterozygotes (Duret and Galtier, 2009). Accordingly, we observe enrichments of W>S mutations (T>C and T>G) in common variants and of S>W mutations (C>A and C>T) in rare variants (Figure 1—figure supplement 6A). Moreover, gBGC violates model assumptions of Relate (for both neutrality and infinite-sites mutation model) and could lead to subtle biases in estimated allele ages (Speidel et al., 2019). Due to the effect of gBGC, W>S mutations are expected to be enriched in older variants compared to S>W variants, and this enrichment is expected to be stronger in regions with high recombination rates (Glémin et al., 2015). Indeed, we observe such enrichment and the expected correlation with recombination rate (Figure 1—figure supplement 7A), supporting the effect of gBGC on the mutation spectrum of variants across mutation ages. Furthermore, the effect of gBGC is expected to vary across populations as its strength depends on the effective population size. Accordingly, we observe that the trends of the ratio of W>S to S>W over time differ across human populations (Figure 1—figure supplement 6B, Figure 1—figure supplement 7B). These results highlight the need to account for gBGC in order to reliably interpret the source of observed differences within and between populations (whether using allele frequency bins or allele age estimates). Pairwise comparisons of mutation types accounting for gBGC In light of the impact of gBGC on the mutation spectrum, we focused on comparisons of pairs of mutation types subject to similar effects of gBGC (i.e., in which both are favored, disfavored, or unaffected by gBGC). Specifically, we focused on four pairwise comparisons including (1) C>T at non-CpGs vs. C>A at non-CpGs; (2) C>T CpGs vs. C>A CpGs; (3) C>G vs. T>A; and (4) T>C vs. T>G. In principle, it is possible that the strength of gBGC is distinct for different types of variantsinvolving S and W alleles (Tsai-Wu et al., 1992). However, in mice, roughly similar conversion rates are observed for C>A and C>T non-crossover gene conversion events as well as for T>C and T>G events (Li et al., 2019), lending support to using pairwise mutation ratios for controlling the effects of gBGC at least to a first approximation. Three of the four pairwise comparisons involve mutation types with the same mutational opportunity (e.g., both T>C and T>G mutations involve changes at ancestral T bases in the genome), which further minimizes the confounding effects of regional variation on the chance of recurrent mutation or strength of background selection. Moreover, the pairwise ratios impose no co-dependency among mutation types as the four comparisons are mathematically independent of each other (although they may be biologically dependent if multiple ratios are affected simultaneously by some change in the mutational process). Investigating the mutation spectrum using these four pairwise comparisons, we observe marked differences in the ratios both over evolutionary time and across populations. Specifically, we find multiple independent signals of mutation rate evolution, reflected by both temporal variation within a population and differences between YRI, CHB, and CEU (p<0.01 by chi-square test after correcting for multiple hypothesis testing; 'Materials and methods'; Figure 2A). These differences may represent geographic or population differences as we these findings in other population samples from the same continents – LWK, TSI, and JPT – from the 1000 Genomes Project (Figure supplement 1). Figure 2 with supplements see all Download asset Open asset of pairwise mutation ratios for polymorphisms in different time pairwise mutation ratios are each of which mutation types that are for mutational opportunity and effects of GC-biased gene conversion The indicates the with the out-of-Africa (OOA) migration. The represent the observed polymorphism while the the assuming a distribution of polymorphism in are three ratios that show significant interpopulation with into each in lower to the from a chi-square test after a for and of levels were used in Figure in ratio in CEU at non-CpG sites, after excluding the four contexts and previously identified to be with the in Europeans by Harris and Pritchard, as well as contexts affected by of in mutational of and (Harris, 2015; Mathieson and Reich, 2017). divergence in ratio among three population ratio in YRI than CEU and CHB samples among old variants, driven by Figure data 1 Text files with (pseudo-)counts of mutations classified into types in genomic regions and within the C>G mutation in YRI, LWK, CEU, TSI, CHB, and JPT in each time window. Download We multiple to out technical artifacts or to the mutation process in to the observed interpopulation We obtain qualitatively similar results when restricting the analysis to putatively neutral regions with B-score > (Figure supplement or comparing regions with high and recombination rates (Figure supplement 3), that the of pairwise comparisons effectively controls for the effects of selection and gBGC. In to account for potential in mutation ages estimated by Relate, we variants by allele frequencies of inferred mutation ages and the signals in mutation age analysis (Figure supplement 4). We observe similar results with different for allele age, based on inferred mutation ages in YRI or CHB (Figure supplement 5). these results strong evidence that the human germline mutation spectrum has evolved over time and across populations. we the and population of the mutation rate changes to each of the signals we in of non-CpG ratio in Europeans The signal that we observed is the transient elevation in the ratio of mutations at non-CpG sites in CEU compared to the ancestral the in the non-CpG ratios of CHB and YRI not a similar at recent This signal the previously enrichment of C>T polymorphisms in a in as well as other contexts (Harris and Pritchard, 2017; Mathieson and Reich, 2017; Speidel et al., 2019). Investigating the temporal patterns in CEU, we find that the increase in the ratio of mutations at non-CpG sites from the time the years or and and in the recent age of (Figure 2A). Because there is large uncertainty in inferred allele ages and approach often the contribution of each variant over or more age bins ('Materials and the and of variation be the transient change in non-CpG C>T mutations likely and of higher than results However, the temporal and geographic enrichment patterns from analysis are with previous based on low-coverage or other datasets (Harris and Pritchard, 2017; Mathieson and Reich, 2017; Speidel et al., 2019). all non-CpG the interpopulation differences are in the four previously contexts and Harris and Pritchard, but are in other non-CpG contexts as well (Figure analysis that these mutational contexts are enriched in of the mutational from mutations in the of in and with exposures to light and et al., Harris, 2015; Mathieson and Reich, 2017). To test whether one of these mutational may be for the observed differences in polymorphism data, we the mutation ratio at non-CpG sites after excluding the contexts affected by or ('Materials and methods'). While we observe some reduction in the of non-CpG ratio in the interpopulation differences remain significant (Figure These results suggest the transient change in non-CpG ratio is not driven by the mutational mechanisms corresponding to either or Thus, the of this signal in Europeans remains unclear. of ratio among populations The interpopulation difference is in the ratio (Figure the among all three populations. In the past both YRI and CEU samples show an increase in the ratio of different while in the CHB, the ratio and then relatively for roughly (Figure 2A). the previous interpopulation differences in remain highly significant for the recent variants as well to factors the relative rates of C>G and T>A mutations at The fraction of C>G in de novo germline mutations is sensitive to parental ages, with the age at (Jónsson et al., 2017). This the that the interpopulation differences in ratio are driven by different average reproductive ages among populations (Coll Macià et al., 2021). To test this we the regional enrichment of C>G mutations – enriched – as of the genome with the C>G SNP that to of the age effect (i.e., the increase in with Jónsson et al., 2017). The ratio within the C>G enriched regions not show significant interpopulation differences (Figure reduced to the much lower SNP in these regions of all but see Figure supplement for of the C>G enriched regions, the three populations differ as much as they genome-wide (Figure indicating that the differential of C>G mutations with ages is not the of the differences
- Peer Review Report
1
- 10.7554/elife.81188.sa2
- Jan 16, 2023
Author response: Limited role of generation time changes in driving the evolution of the mutation spectrum in humans
- Research Article
94
- 10.1098/rstb.2011.0320
- Mar 19, 2012
- Philosophical Transactions of the Royal Society B: Biological Sciences
The human leucocyte antigen (HLA) system shows extensive variation in the number and function of loci and the number of alleles present at any one locus. Allele distribution has been analysed in many populations through the course of several decades, and the implementation of molecular typing has significantly increased the level of diversity revealing that many serotypes have multiple functional variants. While the degree of diversity in many populations is equivalent and may result from functional polymorphism(s) in peptide presentation, homogeneous and heterogeneous populations present contrasting numbers of alleles and lineages at the loci with high-density expression products. In spite of these differences, the homozygosity levels are comparable in almost all of them. The balanced distribution of HLA alleles is consistent with overdominant selection. The genetic distances between outbred populations correlate with their geographical locations; the formal genetic distance measurements are larger than expected between inbred populations in the same region. The latter present many unique alleles grouped in a few lineages consistent with limited founder polymorphism in which any novel allele may have been positively selected to enlarge the communal peptide-binding repertoire of a given population. On the other hand, it has been observed that some alleles are found in multiple populations with distinctive haplotypic associations suggesting that convergent evolution events may have taken place as well. It appears that the HLA system has been under strong selection, probably owing to its fundamental role in varying immune responses. Therefore, allelic diversity in HLA should be analysed in conjunction with other genetic markers to accurately track the migrations of modern humans.
- Book Chapter
1
- 10.1016/b978-0-12-809356-6.00017-4
- Dec 6, 2019
- Clinical Molecular Medicine
Chapter 17 - The human leukocyte antigen system in human disease and transplantation medicine
- Research Article
2
- 10.1016/j.humimm.2022.08.003
- Aug 23, 2022
- Human Immunology
HLA molecular study of patients in a public kidney transplant program in Guatemala
- Book Chapter
- 10.1007/978-3-642-18991-3_40
- Jan 1, 2003
The phenotypic variability in biological populations depends on genes and environmental interactions. In case of past human populations many factors of variability are related to the cultural background, e.g. modifications of the gene flow in the population due to social stratification, or non-random character of the sample resulting from the variability of burial customs. The complexity and fluctuation of the background of variability makes the research on affinities between past populations difficult, but not impossible. In author’s opinion the decisive presentation of the analytical procedure solving all problems related to variability in human populations is not possible. However, some steps towards it can simply be done by increasing attention to the preparation and interpretation stages of standard methodology.KeywordsHeritability EstimatePhysical AnthropologyPast PopulationInterobserver ErrorBiological AffinityThese keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
- Research Article
5
- 10.1016/j.humimm.2019.12.005
- Jan 2, 2020
- Human Immunology
High-resolution allele frequencies for NGS based HLA-A, B, C, DQB1 and DRB1 typing of 23,595 bone marrow donors recruited for the Polish central potential unrelated bone marrow donor registry
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.