Abstract

A central challenge in interpreting personal genomes is determining which mutations most likely influence disease. Although progress has been made in scoring the functional impact of individual mutations, the characteristics of the genes in which those mutations are found remain largely unexplored. For example, genes known to carry few common functional variants in healthy individuals may be judged more likely to cause certain kinds of disease than genes known to carry many such variants. Until now, however, it has not been possible to develop a quantitative assessment of how well genes tolerate functional genetic variation on a genome-wide scale. Here we describe an effort that uses sequence data from 6503 whole exome sequences made available by the NHLBI Exome Sequencing Project (ESP). Specifically, we develop an intolerance scoring system that assesses whether genes have relatively more or less functional genetic variation than expected based on the apparently neutral variation found in the gene. To illustrate the utility of this intolerance score, we show that genes responsible for Mendelian diseases are significantly more intolerant to functional genetic variation than genes that do not cause any known disease, but with striking variation in intolerance among genes causing different classes of genetic disease. We conclude by showing that use of an intolerance ranking system can aid in interpreting personal genomes and identifying pathogenic mutations.

Highlights

  • Many approaches are available that attempt to prioritize mutations in terms of their prior probabilities of conferring risk of disease, notably including population allele frequency and measures of conservation at either the phylogenetic level [1] or in terms of amino acid characteristics [2,3,4,5,6]

  • The ESP6500 dataset is our source for aggregate single nucleotide variant (SNV) sequence data, described elsewhere [7,16]

  • The Online Mendelian Inheritance in Man (OMIM) database was used to assess the utility of the score by correlating the score with whether genes do or do not cause Mendelian diseases [8]

Read more

Summary

Introduction

Many approaches are available that attempt to prioritize mutations in terms of their prior probabilities of conferring risk of disease, notably including population allele frequency and measures of conservation at either the phylogenetic level [1] or in terms of amino acid characteristics [2,3,4,5,6]. Few analogous approaches are available for prioritizing the genes in which the variants are found, despite the fact that all groups performing contemporary sequencing studies have learned that some genes are much more likely to show at least modest (but unconvincing) evidence of association with risk across multiple disease areas than other genes. One reason for this outcome is that some genes carry many more putatively interesting variants in the general population, leading to more potential to show association for such variants. The intolerance score itself is a measure of the deviation from this prediction

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call