Multilingual automation of transcript preprocessing in Alzheimer's disease detection.

Sylvie Ratté,Frédéric Abiven

doi:10.1002/trc2.12147

Abstract

IntroductionAnalyzing linguistic functions can improve early detection of Alzheimer's disease (AD). To date, no studies have focused on creating a universal pipeline for clinical transcript preprocessing.MethodsThis article presents a simple and efficient method for processing linguistic and phonetic data, sequencing subproblems of cleaning, normalization, and measure extraction tasks. Because some of these tasks are language‐ and context‐ dependent, they were designed to be easily configurable, thus increasing their scalability when dealing with new corpora.ResultsResults show improved performances over previous studies in this time‐consuming preprocessing task. Moreover, our findings showed that some discursive markers extracted from transcripts revealed a significant correlation (>0.5) with cognitive impairment severity.DiscussionThis article contributes to the literature on AD by presenting an efficient pipeline that allows speeding up the transcripts preprocessing task. We further invite other researchers to contribute to this work to help improve the quality of this pipeline (https://github.com/LiNCS-lab/usAge).

Full Text