Abstract
Trio exome sequencing has been successful in identifying genes with de novo mutations (DNMs) causing epileptic encephalopathy (EE) and other neurodevelopmental disorders. Here, we evaluate how well a case-control collapsing analysis recovers genes causing dominant forms of EE originally implicated by DNM analysis. We performed a genome-wide search for an enrichment of "qualifying variants" in protein-coding genes in 488 unrelated cases compared to 12,151 unrelated controls. These "qualifying variants" were selected to be extremely rare variants predicted to functionally impact the protein to enrich for likely pathogenic variants. Despite modest sample size, three known EE genes (KCNT1, SCN2A, and STXBP1) achieved genome-wide significance (p<2.68×10−6). In addition, six of the 10 most significantly associated genes are known EE genes, and the majority of the known EE genes (17 out of 25) originally implicated in trio sequencing are nominally significant (p<0.05), a proportion significantly higher than the expected (Fisher’s exact p = 2.33×10−17). Our results indicate that a case-control collapsing analysis can identify several of the EE genes originally implicated in trio sequencing studies, and clearly show that additional genes would be implicated with larger sample sizes. The case-control analysis not only makes discovery easier and more economical in early onset disorders, particularly when large cohorts are available, but also supports the use of this approach to identify genes in diseases that present later in life when parents are not readily available.
Highlights
One of the most important recent developments in human genomics is the use of a trio sequencing paradigm to implicate new disease genes in sporadic disease by evaluating patterns of de novo mutations (DNMs)
In parallel to these developments, collapsing analyses, which typically compare the burden of rare, presumably deleterious variants gene by gene in cases versus controls, have proven increasingly successful in implicating diseases genes, for example in amyotrophic lateral sclerosis[11, 12], idiopathic pulmonary fibrosis[13, 14], and monogenic disorders[15]. It has not yet been assessed whether the collapsing framework can identify the genes implicated by analysis of trio sequencing data. We addressed this question by implementing a genome-wide gene-based collapsing analysis using whole exome sequencing (WES) data generated from 488 epileptic encephalopathy (EE) patients, including those previously analyzed using the trio-based DNM analysis framework, and a large cohort of unrelated control individuals to assess the efficacy of case-control analysis to identify disease genes implicated by DNM analysis for EE
We used a hypergeometric test to assess whether these 25 known dominant EE genes tend to have lower p-values in our case-control gene-based collapsing analysis compared with the rest of the genome
Summary
One of the most important recent developments in human genomics is the use of a trio sequencing paradigm to implicate new disease genes in sporadic disease by evaluating patterns of de novo mutations (DNMs). A precise estimate of mutation rate is not available for small insertion/deletions (indels)[1], limiting the ability to assess the significance of genes harboring de novo indels In parallel to these developments, collapsing analyses, which typically compare the burden of rare, presumably deleterious variants gene by gene in cases versus controls, have proven increasingly successful in implicating diseases genes, for example in amyotrophic lateral sclerosis[11, 12], idiopathic pulmonary fibrosis[13, 14], and monogenic disorders[15]. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.