Abstract

The identification of disease causing rare variants is becoming possible with the advent of next generation sequencing (Nejentsev et al., 2009; Calvo et al., 2010; Johansen et al., 2010). Recently Kim and Schuster (2013) explored this task with large numbers of publicly available mitochondrial genomes, in an attempt to disentangle selective from demographic effects. The recent rapid population expansion of humans is expected to give rise to an excess of rare variants both neutral and deleterious (Keinan and Clark, 2012) and is thus expected to make the detection of these rare disease causing variants more difficult. Kim and Schuster (2013) first infer the demographic history of the population, finding support for a model of population structure and expansion, as supported by previous studies (Gutenkunst et al., 2009; Gravel et al., 2011) but, with a lower growth rate. Forward simulations are used under this demographic model to determine the maximum relative frequency that negatively selected alleles may be expected to reach in each of the considered sub-populations. It should be pointed out that their criteria was simply the largest frequency observed across 10,000 simulations, and is thus expected to be conservative. It is found that most of the currently known and suspected disease rare variants had frequencies below the threshold and account for a small portion of the total rare variants. Surprisingly however, a number of disease rare variants (>5%) showed frequencies above the threshold which suggests other forces acting on these SNVs. Perhaps the more interesting result is the sample sizes required to provide reasonable power to investigate disease causing rare variants. Although high (2400–7400 per subpopulation), such sample sizes seem possible within the next decade given the reduced cost and higher throughput of sequencing technologies. An intuitive understanding of the underlying dynamics of the evolution of rare variants can be gained by considering the coalescent trees that are likely under models of population growth. Population growth results in trees where coalescent events are rare in the recent past and tend toward a more star like topology (Wakeley, 2009), owing to the relationship between population size and rate of coalescence. This results in long terminal branches relative to internal branches—resulting in an increased number of rare variants and a decreased number of intermediate frequency variants relative to the equilibrium neutral model. Under such a topology, singletons become more common and intermediate frequency variants are reduced in prevalence. The problem of detecting rare disease alleles is thus effectively more difficult, as the needles are now in a larger haystack. Because of long terminal branches, we would expect neutral singleton SNPs on shared haplotypes. Additionally contributing to this challenge, sequencing errors make it difficult to call rare variants, and thus the mutations of interest may be excluded from analysis depending on their frequency in the population. In Kim and Schuster (2013), data sets are split into data with and without singletons for analysis and comparison. The picture with selection is less intuitive, but as population sizes increase, drift becomes less dominant and selection is more effective. Gazave et al. (2013) provides valuable insight into low frequency deleterious mutations under population growth. Their findings showed that while the number of deleterious mutations per individual increased, the mean effect decreased. In their study, mutations are not independent and more importantly, are considered as a distribution of fitness effects (DFE). Within this context, Gazave et al. (2013) argued that selection acted more efficiently on strongly deleterious mutations reducing their frequency, while mildly deleterious mutations were more prevalent. Even though Kim and Schuster (2013) assumes non-interfering independent mutations, simulations carried out with a higher rate of population growth showed a lower frequency of deleterious mutations compared with weaker growth models. This suggests that under expanding populations, disease risk is distributed over a larger number of weakly deleterious mutations as compared to equilibrium populations. A contrasting point between these studies is the assumption of independence of the deleterious mutations in Kim and Schuster (2013) when calculating upper frequency bounds. It has been suggested that the high frequency of some rare disease mutations in human populations is due to hitchhiking with a nearby beneficial mutation. This is one explanation that may explain why some disease variants have frequency above the conservative threshold. Indeed, perhaps the most important effect in growing populations is the dependence on DFE when considering complete haplotype fitness. Weak beneficial mutations will have a better chance of establishing in the population as the results of Gazave et al. (2013) suggest. These more prevalent beneficial mutations would then help pull deleterious mutations to higher frequency than would otherwise be expected. Conversely, as the rate of recombination increases, Hill-Robertson effects (Hill and Robertson, 1966) will be minimized providing a mechanisms for nearly neutral mutations to escape from strongly deleterious mutations over time. It should be clear that the form of the distribution around neutrality will have an impact on the expected numbers of deleterious mutations in any individual (Figure ​(Figure1).1). Further investigation of the effects and sensitivity of the DFE in expanding populations would be an interesting contribution to the field. Figure 1 A visual representation of the effect of increasing population size (Ne) on the fraction of deleterious, neutral, and beneficial mutations, for a given potential realization of the distribution of fitness effects (DFE). As shown, the effective fraction ... Finding disease causing rare variants or even risk factors in humans remains difficult, in part, owing to recent expansion. Larger sample sizes, while considering nonindependent mutations drawn from the DFE, is a promising way forward. Currently, the sample sizes required seem huge, but continued advances in sequencing technology they are increasingly feasible - indeed, we can expect to have samples sizes comparable to effective population size in the near future. However, the coalescent traditionally requires that sample size be much smaller than effective population size and extensions or behavior of the coalescent with violations would need to be considered. Wakeley and Takahashi (2003) investigated the even more extreme case of sample size exceeding effective population size, with the result that rare mutations are even more prevalent and that mutation rate and effective population size can be separately estimated. Thus, continued theoretical development combined with extensive on-going sequencing efforts may indeed help to differentiate the fraction of new and segregating deleterious mutations in human populations.

Highlights

  • The identification of disease causing rare variants is becoming possible with the advent of generation sequencing (Nejentsev et al, 2009; Calvo et al, 2010; Johansen et al, 2010)

  • An intuitive understanding of the underlying dynamics of the evolution of rare variants can be gained by considering the coalescent trees that are likely under models of population growth

  • Population growth results in trees where coalescent events are rare in the recent past and tend toward a more star like topology (Wakeley, 2009), owing to the relationship between population size and rate of coalescence

Read more

Summary

Introduction

The identification of disease causing rare variants is becoming possible with the advent of generation sequencing (Nejentsev et al, 2009; Calvo et al, 2010; Johansen et al, 2010). The recent rapid population expansion of humans is expected to give rise to an excess of rare variants both neutral and deleterious (Keinan and Clark, 2012) and is expected to make the detection of these rare disease causing variants more difficult. Kim and Schuster (2013) first infer the demographic history of the population, finding support for a model of population structure and expansion, as supported by previous studies (Gutenkunst et al, 2009; Gravel et al, 2011) but, with a lower growth rate.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call