Abstract

Abstract Objective Machine learning (ML) classifier performance estimates are affected by sample size and class imbalance in training data, and yet performance is often reported with balanced data. We explore the effect of varying sample size and dementia conversion base rate on the performance of a classifier that predicts future dementia. Method Longitudinal data from the National Alzheimer’s Coordination Center (NACC) Uniform Data Set (UDS) were used. All participants had MCI at baseline. A random forest classifier (RFC) was trained to predict dementia at 1, 2, and 3 years. Predictors included baseline neuropsychological test scores, demographics, and health history. Cases were sampled at multiple sample sizes (N = 125, 250, 500, 1000 and 2000) and base rates (0.1, 0.2, 0.3, 0.4, and 0.5). Performance was evaluated using Matthews Correlation Coefficient (MCC). Results For balanced data (N = 1000), the classifier predicts conversion to dementia at 3 years with an MCC of 0.54 (sensitivity = 0.79; specificity = 0.75). As expected, means of classifier performance estimates decline as the conversion rate decreases. Likewise, variability of estimates increases with smaller sample sizes. For a conversion rate of 30%, consistent with many memory clinics, classifier performance declines only moderately (MCC = 0.44). In conversion rates of 10% and 20%, performance approaches chance. Performance trends illustrated in Figure 1. Conclusions Such classifiers may have clinical utility in memory clinics with higher conversion rates. Expected tradeoffs are observed with respect to diminishing sample size increasing error variance, and higher base rates of positive cases improving overall performance. Results provide potential guidelines for sample size and recruitment targets with RFC designs.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.