Integration of demographics, genetics, imaging and metabolomics data to identify Alzheimer’s disease patients

Vincent Damotte,Vincent Chouraki,Céline Bellenguez,Aline Meirhaeghe,Philippe Amouyel,Guillemette Marot

doi:10.1002/alz.042659

Abstract

AbstractBackgroundAvailability of omics data offers new opportunities to improve diagnosis of Alzheimer Disease (AD). Most of existing efforts have focused on one dimension at a time (genetics, metabolomics, imaging...) without considering them simultaneously. We integrated clinical, genetic, metabolomics and imaging data of individuals from the Alzheimer Disease Neuroimaging Initiative (ADNI) dataset in order to build models to accurately classify AD and cognitively normal (CN) individuals.MethodSamples were divided into training (65 AD and 82 CN samples) and validation datasets (33 AD and 41 CN samples). Three modalities were considered in this analysis: metabolomics, imaging and a third modality containing known demographic and genetic AD risk factors. After quality control, 6 known risk factors, 119 metabolomic variables and 283 imaging variables were included. Sparse Partial Least Squares‐Discriminant Analysis (sPLS‐DA) was performed in order to classify individuals into AD and CN categories, based on a small number of variables. This method was used to build three models, one for each of the three modalities separately. We then built two additional models with the three modalities considered as (1) one super modality (i.e. combining the three modalities into one modality), or (2) independent modalities. For each model including the genetics modality, we built three different models, including as the genetic variable(s) either Apolipoprotein E4 (APOE4) status only, or the Desikan’s polygenic hazard score (PHS) only, or the two genetic variables: APOE4 genotype and the PHS but computed without APOE4. Classification performance of each model was assessed on the validation dataset, using Area Under the Curve (AUC), sensitivity, specificity, negative and positive predictive values (NPV and PPV).ResultsSeven models out of eleven had an AUC above 0.8. The best model was built using the super modality and including the PHS as the genetic variable, with an AUC of 0.89 [0.82‐0.96]. Sensitivity, specificity, NPV and PPV were 79%, 100%, 85% and 100%, respectively.ConclusionWe built diagnostic models in order to classify AD and CN individuals. The best model was highly efficient, with an AUC of 0.89 and needs to be further validated in an independent cohort.

Full Text