AbstractBackgroundAccurate prediction of dementia subtypes is important in geriatric decision‐making and in designing clinical trials. We used machine learning and a novel error control algorithm to analyze longitudinal data collected by the National Alzheimer Coordinating Center (NACC) from June 2005 to November 2021. Our goals were (1) Predicting the conversion to several dementia subtypes in patients who were dementia‐free at baseline; (2) Finding important clinical metrics and risk factors that contribute to the prediction; (3) Controlling the prediction error rate on specific dementia subtypes to help design clinical trials.MethodWe studied the 13,243 NACC participants who had normal clinical etiologic diagnosis results at their first visit. Then, we fit different machine learning models to predict the diagnosis result of their last visit. Among 13,243 patients, 10,679 remained dementia‐free, 1,793 had Alzheimer’s disease (AD), 153 had Lewy body disease (LBD), and 290 had Vascular dementia (VD) as the primary diagnosis at their last exam. After removing predictors with over 10% missing values and imputing missing values on the remaining predictors, our models tested 249 predictors, which included clinical, family history, and behavioral features. Cases were randomly split the data into training (90%) and testing (10%) subsets for 500 rounds. Multinomial regression, penalized multinomial regression, and random forest models were analyzed. We further manipulated the penalized multinomial regression model by controlling the prediction error rate on a given dementia subtype.ResultMultinomial regression model, penalized multinomial regression model and random forest led to the overall prediction error rate of 10.73%, 9.26%, and 9.38%, respectively. Significant predictors included the length of time from the initial visit to the most recent visit, the cognitive status of the patient in the visit, and metrics of memory deficits. Error control results (Figure 1) showed that relaxing the strict control on some dementia subtypes serves to improve the prediction performance of the other dementia subtypes.ConclusionMachine learning methods can help predict the progression to dementia using commonly available variables available in memory clinics even in the absence of imaging and fluid biomarkers. ML is helpful in identifying most important clinical features and key risk factors.