Abstract

Alzheimer's disease (AD) is the most common progressive neurodegenerative disorder in the elderly, which will eventually lead to dementia without an effective precaution and treatment. As a typical complex disease, the mechanism of AD's occurrence and development still lacks sufficient understanding. In this study, we aim to directly analyze the relationship between DNA variants and phenotypes based on the whole genome sequencing data. Firstly, to enhance the biological meanings of our study, we annotate the deleterious variants and mapped them to nearest protein coding genes. Then, to eliminate the redundant features and reduce the burden of downstream analysis, a multi-objective evaluation strategy based on entropy theory is applied for ranking all candidate genes. Finally, we use multi-classifier XGBoost for classifying unbalanced data composed with 46 AD samples, 483 mild cognitive impairment (MCI) samples and 279 cognitive normal (CN) samples. The experimental results on real whole genome sequencing data from Alzheimer's Disease Neuroimaging Initiative (ADNI) show that our method not only has satisfactory classification performance but also finds significance correlation between AD and RIN3, a known susceptibility gene of AD. In addition, pathway enrichment analysis was carried out using the top 20 feature genes, and three pathways were confirmed to be significantly related to the formation of AD. From the experimental results, we demonstrated that the efficacy of our proposed method has practical significance.

Highlights

  • Alzheimer’s disease (AD) is the most common progressive neurodegenerative disorder in the elderly, which will eventually lead to dementia without an effective precaution and treatment

  • The merits of our work is reflected in three points: firstly, in order to improve the interpretability of the study and reduce false positives, VEP [29] mutation annotation software is used to screen risk single nucleotide polymorphism (SNP) from the whole genome, and secondly, our method directly predicts the outcomes only with genome variants information, which avoids the defect that the variation identified by imaging genetics method is not strongly associated with individual phenotype; with using XGBoost, our method can accurately recognize the mild cognitive impairment (MCI) samples and our method further validates that the Rab Interactor 3 (RIN3) has the strongest correlation to AD and pathway enrichment analysis results with the top 20 genes show that three pathways (Pyruvate metabolism, Glycine, serine and threonine metabolism and ABC transporters) are significant (p-value < 0.05)

  • To demonstrate the advantages of XGBoost on unbalanced data, we compare the results of XGBoost with Logistic Regression (LR)

Read more

Summary

Introduction

Alzheimer’s disease (AD) is the most common progressive neurodegenerative disorder in the elderly, which will eventually lead to dementia without an effective precaution and treatment. Different studies may come to different conclusions, or even completely conflicting conclusions about genetic variants involvement in the complex disease onset and progression, which result in lack of reproducibility. Another challenge is that the uncovered susceptible variants only account for a limited proportion of the heritability for each complex disease [14,15,16,17,18,19]. The APOEε4 allele consistently reproduced in lots of studies [21], only small proportion of AD patients hold the APOEε4 allele [22], namely low heritability It means that there are other genetic variants with marginal effect contributing to the risk of AD. Considering the epistatic interaction between genetic variants instead of univariate analysis may enhance the heritability of AD and be able to identify unknown risk variants

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.