Abstract

Alzheimer's disease (AD) is a neurodegenerative disease characterized by dementia and, eventually, a loss of cognitive abilities. Two histopathological features are associated with AD, neurofibrillary tangles, and amyloid-beta plaque. Both contribute to neuron cell death, neuron dysfunction, and AD pathogenesis. Current methods to diagnose AD remain reliant on symptomatic diagnosis with interviews that can be time-consuming, costly, and inaccurate. Alternative methods such as brain imaging are expensive and require extensive laboratory setup for accurate results. Thus molecular-level quantitative approaches are necessary. Omics datasets and machine learning technology advancements have opened new avenues to diagnose AD. This paper proposes using statistical methods such as principal component analysis, t-distributed stochastic neighbor embedding, and Kolmogorov-Smirnov test combined with Benjamini-Hochberg correction through feature selection and dimensionality reduction to isolate significant features associated with AD. Furthermore, we developed machine learning models based on logistic regression, random forest classifier, and deep neural network (DNN) classifier to predict AD diagnosis. Eight unique genes (TGM2, NKIRAS1, SYK, GABARAPL2, ABCC12, NDEL1, TEP1) were identified as significant biomarkers of AD and confirmed previous works identifying prognoses' roles in AD. After extensive hyperparameter tuning, the DNN model showed the best prediction performance for AD diagnosis among the three machine learning algorithms. The DNN model and preprocessed dataset demonstrated a 5-fold cross-validation accuracy of 0.823 and AUC-ROC of 0.940. Its code is publicly available at https://www.kaggle.com/neobrando/ml-dnn.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call