Deep learning–based genome‐wide association analysis in Alzheimer’s disease

Taeho Jo,Kwangsik Nho,Andrew J Saykin

doi:10.1002/alz.056510

Abstract

AbstractBackgroundGenome‐wide association study (GWAS) designs are widely used to identify genetic loci associated with Alzheimer’s disease (AD) by performing a statistical test for each single‐nucleotide polymorphism (SNP). This creates a significant multiple testing challenge. Deep learning has demonstrated remarkable ability to identify non‐linear patterns using large data sets but application to AD genetics has been limited. Here we report preliminary results of a deep learning framework developed to identify AD‐associated genetic variation on a genome‐wide scale.MethodWe used genome‐wide genotyping data (12,448,786 SNPs following imputation) from 916 participants in the Alzheimer’s Disease Neuroimaging Initiative (458 cognitively normal controls and 458 AD patients). A convolutional neural network (CNN) consisting of convolutional, pooling and fully connected Softmax layers was used in a two‐stage approach. Data was divided into training‐testing‐validation sets (60:20:20 ratio). Area under the curve (AUC) was used to assess the model performance.ResultThe first stage of the deep learning approach identified 2,335 candidate genetic regions (93,400 SNPs) as associated with AD. The second stage investigated the association of identified SNPs with AD by calculating p‐values for each SNP based on AD influence z‐scores derived from the deep learning model. This approach identified genetic loci in the APOE region as most highly associated with AD (p‐value < 5X10‐8). Case/control classification using the identified SNPs yielded mean AUCs of 0.74, 0.79, 0.82, 0.90 for the thresholds of p‐value = 1x10‐5 (114 SNPs), 1x10‐4 (243 SNPs), 1x10‐3 (724 SNPs) and 1x10‐2 (2,846 SNPs), respectively (Fig. 1) and mean accuracies were 0.66, 0.69, 0.73, and 0.81, respectively.ConclusionPreliminary results indicate that a deep learning approach can be used to identify AD‐associated genetic loci and reduce the computational complexity to detect nonlinear interactions between SNPs, which may yield enhanced prediction accuracy for AD risk using genetic information. Future refinements of the deep learning framework are planned, including methods to reduce computational time and integrate additional omics layers and clinical data, as well as validation with independent replication samples.

Full Text