Early Diagnosis of Alzheimer's Disease Using Hybrid Word Embedding and Linguistic Characteristics

Yangyang Li

doi:10.1145/3446132.3446197

Abstract

Early detection of Alzheimer's Disease (AD) is of great importance to the benefits of AD patients, including lessening symptoms and alleviating the financial burden of health care. As one of the leading signs of AD, changes of language capability can potentially be used for early diagnosis of AD. In this paper, I develop an automatic and accurate diagnostic model by using the linguistic characteristics of the subjects and hybrid word embedding. I detected linguistic features such as pauses, unintelligible words, repetitions, etc. from transcripts of interviews. Then I create a text embedding by combining word vectors from Doc2vec and ELMo. Moreover, by tuning hyperparameters of the machine learning pipeline (e.g., model regularization parameter, learning rate and vector size of Doc2vec, and vector size of ELMo), I achieve 91% classification accuracy and an Area Under the Curve (AUC) of 97% for distinguishing early AD from healthy subjects. Compared with the method which only uses word count, I improved the absolute detection accuracy by 10%, and the absolute AUC by 9%. Moreover, I study the stability of the model by repeating experiment and find out that the model is stable even though my training data is split randomly. My algorithms have high detection accuracy and are stable. This model could be used as a large-scale screening method for AD, as well as a complement to doctors’ detection of AD.

Full Text