Abstract
The acceleration of DNA sequencing in samples from patients and population studies has resulted in extensive catalogues of human genetic variation, but the interpretation of rare genetic variants remains problematic. A notable example of this challenge is the existence of disruptive variants in dosage-sensitive disease genes, even in apparently healthy individuals. Here, by manual curation of putative loss-of-function (pLoF) variants in haploinsufficient disease genes in the Genome Aggregation Database (gnomAD)1, we show that one explanation for this paradox involves alternative splicing of mRNA, which allows exons of a gene to be expressed at varying levels across different cell types. Currently, no existing annotation tool systematically incorporates information about exon expression into the interpretation of variants. We develop a transcript-level annotation metric known as the ‘proportion expressed across transcripts’, which quantifies isoform expression for variants. We calculate this metric using 11,706 tissue samples from the Genotype Tissue Expression (GTEx) project2 and show that it can differentiate between weakly and highly evolutionarily conserved exons, a proxy for functional importance. We demonstrate that expression-based annotation selectively filters 22.8% of falsely annotated pLoF variants found in haploinsufficient disease genes in gnomAD, while removing less than 4% of high-confidence pathogenic variants in the same genes. Finally, we apply our expression filter to the analysis of de novo variants in patients with autism spectrum disorder and intellectual disability or developmental disorders to show that pLoF variants in weakly expressed regions have similar effect sizes to those of synonymous variants, whereas pLoF variants in highly expressed exons are most strongly enriched among cases. Our annotation is fast, flexible and generalizable, making it possible for any variant file to be annotated with any isoform expression dataset, and will be valuable for the genetic diagnosis of rare diseases, the analysis of rare variant burden in complex disorders, and the curation and prioritization of variants in recall-by-genotype studies.
Highlights
Variants that occur in an exon differentially included in two isoforms of CACNA1C with diverse patterns of tissue expression result in distinct types of Timothy syndrome[5]
Mendelian disease variants have been found on tissue-specific isoforms[9,10] and isoform expression levels in TTN have been used to show that putative loss-of-function (pLoF) variants found in healthy controls occur in exons that are absent from dominantly expressed isoforms, whereas those in patients with dilated cardiomyopathy occur on constitutive exons[11], emphasizing the utility of exon expression information for variant interpretation
In the Genome Aggregation Database (gnomAD) database, we identify 401 high-quality pLoF variants that pass both sequencing and annotation quality filters in 61 haploinsufficient disease genes in which heterozygous pLoF variants are established to cause severe developmental delay phenotypes with high penetrance (Methods)
Summary
We observe significantly lower expression for unconserved regions, and near-constitutive expression in highly conserved regions (Fig. 3a, Supplementary Fig. 5a). This difference remains statistically significant after correcting for exon length (logistic regression P < 1.0 × 10−100), which can influence both phyloCSF scores and isoform quantifications, indicating that transcript expression-aware annotation marks functionally relevant exonic regions. The pext value is higher for pLoF variants annotated as high confidence by the loss-of-function transcript effect estimator (LOFTEE) package[1], with no additional flags than those flagged as having found on unlikely open-reading frames or weakly conserved regions (Fig. 3b, Supplementary Fig. 5b). LOFTEE ag phyloCSF unlikely ORF pLoF Synonymous pext expression bin High Medium Low
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.