Abstract

Recent studies have demonstrated that multiple early-onset diseases have shared risk genes, based on findings from de novo mutations (DNMs). Therefore, we may leverage information from one trait to improve statistical power to identify genes for another trait. However, there are few methods that can jointly analyze DNMs from multiple traits. In this study, we develop a framework called M-DATA (Multi-trait framework for De novo mutation Association Test with Annotations) to increase the statistical power of association analysis by integrating data from multiple correlated traits and their functional annotations. Using the number of DNMs from multiple diseases, we develop a method based on an Expectation-Maximization algorithm to both infer the degree of association between two diseases as well as to estimate the gene association probability for each disease. We apply our method to a case study of jointly analyzing data from congenital heart disease (CHD) and autism. Our method was able to identify 23 genes for CHD from joint analysis, including 12 novel genes, which is substantially more than single-trait analysis, leading to novel insights into CHD disease etiology.

Highlights

  • The development of sequencing technologies such as Whole Exome Sequencing (WES) has led to the identifications of the genetic causes of many diseases in the past decades

  • With the development of new generation sequencing technology, germline mutations such as de novo mutations (DNMs) with deleterious effects can be identified to aid in discovering the genetic causes for early on-set diseases such as congenital heart disease (CHD)

  • The statistical power is still limited by the small sample size of DNM studies due to the high cost of recruiting and sequencing samples, and the low occurrence of DNMs given its rarity

Read more

Summary

Introduction

The development of sequencing technologies such as Whole Exome Sequencing (WES) has led to the identifications of the genetic causes of many diseases in the past decades. Homsy et al identified an excess of protein-damaging DNMs in 1,213 exome-sequenced CHD parent-offspring trios, especially in genes highly expressed in the developing heart and brain [4]. Jin et al found that DNMs accounted for 8% of CHD cases and identified striking overlap between genes with damaging DNMs in probands with CHD and autism [5]. These studies showed that DNM analyses can play an important role in exploring the genetic etiology of CHD. The statistical power for identifying risk genes is still hampered by the limited sample size of DNM studies due to its relatively high cost in recruiting and sequencing samples, as well as the low occurrence of DNMs given its rarity

Methods
Findings
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call