Abstract
Balancing selection occurs when multiple alleles are maintained in a population, which can result in their preservation over long evolutionary time periods. A characteristic signature of this long-term balancing selection is an excess number of intermediate frequency polymorphisms near the balanced variant. However, the expected distribution of allele frequencies at these loci has not been extensively detailed, and therefore existing summary statistic methods do not explicitly take it into account. Using simulations, we show that new mutations which arise in close proximity to a site targeted by balancing selection accumulate at frequencies nearly identical to that of the balanced allele. In order to scan the genome for balancing selection, we propose a new summary statistic, β, which detects these clusters of alleles at similar frequencies. Simulation studies show that compared with existing summary statistics, our measure has improved power to detect balancing selection, and is reasonably powered in non-equilibrium demographic models and under a range of recombination and mutation rates. We compute β on 1000 Genomes Project data to identify loci potentially subjected to long-term balancing selection in humans. We report two balanced haplotypes—localized to the genes WFS1 and CADM2—that are strongly linked to association signals for complex traits. Our approach is computationally efficient and applicable to species that lack appropriate outgroup sequences, allowing for well-powered analysis of selection in the wide variety of species for which population data are rapidly being generated.
Highlights
The availability of high-quality, population-level genomic data from a wide variety of species has spurred recent efforts to detect genomic regions subjected to natural selection (Vitti et al, 2013, Xu et al, 2015, Singh et al, 2012)
Inspired by the structure of summaryspectrum based statistics (Tajima, 1989, Fay and Wu, 2000), we developed a new summary statistic that detects these clusters of variants at highly correlated allele frequencies
We computed the power of β, Tajima’s D, HKA, and T1 to distinguish between simulation replicates with a balanced variant or those with only neutral mutations
Summary
The availability of high-quality, population-level genomic data from a wide variety of species has spurred recent efforts to detect genomic regions subjected to natural selection (Vitti et al, 2013, Xu et al, 2015, Singh et al, 2012). One type of pressure, balancing selection, occurs when more than one allele is maintained at a locus. This selection can arise from overdominance (in which the fitness of heterozygotes at a locus is higher than either type of homozygote) or from frequency-, temporally-, or spatially-dependent selection (Charlesworth, 2006). A classic case of overdominance occurs at the hemoglobin-β locus in populations located in malaria-endemic regions. Homozygotes for one allele have sickle-cell anemia, and homozygotes for the other allele have an increased risk of malaria. Heterozygotes are protected from malaria, and at most have a mild case of sickle-cell anemia (Aidoo et al, 2002, Luzzatto, 2012)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.