Abstract

Classification of disease and healthy volunteer cohorts provides a useful clinical alternative to traditional group statistics due to individualized, personalized predictions. Classifiers for neurodegenerative disease can be trained on structural MRI morphometry, but require large multi-scanner datasets, introducing confounding batch effects. We test ComBat, a common harmonization model, in an example application to classify subjects with Parkinson’s disease from healthy volunteers and identify common pitfalls, including data leakage. We used a multi-dataset cohort of 372 subjects (216 with Parkinson’s disease, 156 healthy volunteers) from 11 identified scanners. We extracted both FreeSurfer and the determinant of Jacobian morphometry to compare single-scanner and multi-scanner classification pipelines. We confirm the presence of batch effects by running single scanner classifiers which could achieve wildly divergent AUCs on scanner-specific datasets (mean:0.651 ± 0.144). Multi-scanner classifiers that considered neurobiological batch effects between sites could easily achieve a test AUC of 0.902, though pipelines that prevented data leakage could only achieve a test AUC of 0.550. We conclude that batch effects remain a major issue for classification problems, such that even impressive single-scanner classifiers are unlikely to generalize to multiple scanners, and that solving for batch effects in a classifier problem must avoid circularity and reporting overly optimistic results.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.