Abstract

Bridging heterogeneous mutation data fills in the gap between various data categories and propels discovery of disease-related genes. It is known that genome-wide association study (GWAS) infers significant mutation associations that link genotype and phenotype. However, due to the differences of size and quality between GWAS studies, not all de facto vital variations are able to pass the multiple testing. In the meantime, mutation events widely reported in literature unveil typical functional biological process, including mutation types like gain of function and loss of function. To bring together the heterogeneous mutation data, we propose a 'Gene-Disease Association prediction by Mutation Data Bridging (GDAMDB)' pipeline with a statistic generative model. The model learns the distribution parameters of mutation associations and mutation types and recovers false-negative GWAS mutations that fail to pass significant test but represent supportive evidences of functional biological process in literature. Eventually, we applied GDAMDB in Alzheimer's disease (AD) and predicted 79 AD-associated genes. Besides, 12 of them from the original GWAS, 60 of them are supported to be AD-related by other GWAS or literature report, and rest of them are newly predicted genes. Our model is capable of enhancing the GWAS-based gene association discovery by well combining text mining results. The positive result indicates that bridging the heterogeneous mutation data is contributory for the novel disease-related gene discovery.

Highlights

  • MethodsTo bring together the heterogeneous mutation data, we propose a pipeline, “Gene-Disease

  • Bridging heterogeneous mutation data fills in the gap between various data categories and propels discovery of disease-related genes

  • Besides 12 of them come from the original genome-wide association study (GWAS) study, 57 of them are supported to be Alzheimer’s disease (AD)-related by other GWAS or literature report

Read more

Summary

Methods

We will introduce the modules and models in the pipeline of GDAMDB model for gene-disease. For a disease d (d = 1, ⋯ , D) and a gene g (g = 1, ⋯ , G), fdg ∈ {0,1}3 encodes the associated mutation type of gene g for disease d, i.e., LOF/GOF/NA captured from literature, while pdg ∈ (0,1) refers to the p-value of the mapped mutation association of gene g for disease d in GWAS.

Results
Background
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.