Abstract There is a growing interest in using machine learning (ML) models to perform automatic diagnosis of psychiatric conditions; however, generalising the prediction of ML models to completely independent data can lead to sharp decrease in performance. Patients with different psychiatric diagnoses have traditionally been studied independently, yet there is a growing recognition of neuroimaging signatures shared across them as well as rare genetic copy number variants (CNVs). In this work, we assess the potential of multi-task learning (MTL) to improve accuracy by characterising multiple related conditions with a single model, making use of information shared across diagnostic categories and exposing the model to a larger and more diverse dataset. As a proof of concept, we first established the efficacy of MTL in a context where there is clearly information shared across tasks: the same target (age or sex) is predicted at different sites of data collection in a large fMRI dataset compiled from multiple studies. MTL generally led to substantial gains relative to independent prediction at each site. Performing scaling experiments on the UK Biobank, we observed that performance was highly dependent on sample size: for large sample sizes (N>6000) sex prediction was better using MTL across three sites (N=K per site) than prediction at a single site (N=3K), but for small samples (N<500) MTL was actually detrimental for age prediction. We then used established machine learning methods to benchmark the diagnostic accuracy of each of the 7 CNVs (N=19-103) and 4 psychiatric conditions (N=44-472) independently, replicating the accuracy previously reported in the literature on psychiatric conditions. We observed that MTL hurt performance when applied across the full set of diagnoses, and complementary analyses failed to identify pairs of conditions which would benefit from MTL. Taken together, our results show that if a successful multi-task diagnostic model of psychiatric conditions were to be developed with resting-state fMRI, it would likely require datasets with thousands of patients across different diagnoses.
Read full abstract