Abstract

BackgroundThe treatment response of patients with schizophrenia is heterogeneous, and markers of clinical response are missing. Studies using machine learning approaches have provided encouraging results regarding prediction of outcomes, but replicability has been challenging. In the present study, we present a novel methodological framework for applying machine learning to clinical data. Herein, algorithm selection and other methodological choices were based on model performance on a simulated dataset, to minimize bias and avoid overfitting. We subsequently applied the best performing machine learning algorithm to a rich, multimodal neuropsychiatric dataset. We aimed to 1) classify patients from controls, 2) predict short- and long-term clinical response in a sample of initially antipsychotic-naïve first-episode schizophrenia patients, and 3) validate our methodological framework.MethodsWe included data from 138 antipsychotic-naïve, first-episode schizophrenia patients, who had undergone assessments of psychopathology, cognition, electrophysiology, structural magnetic resonance imaging (MRI). Perinatal data and long-term outcome measures were obtained from Danish registers. Baseline diagnostic classification algorithms also included data from 151 matched healthy controls.Short-term treatment response was defined as change in psychopathology after the initial antipsychotic treatment period. Long-term treatment response (4–16 years) was based on data from Danish registers. The simulated dataset was generated to resemble the real data with respect to dimensionality, multimodality, and pattern of missing data. Noise levels were tunable to enable approximation to the signal-to-noise ratio in the real data. Robustness of the results was ensured by running two parallel, fundamentally different machine learning pipelines, a ‘single algorithm approach’ and an ‘ensemble approach’. Both pipelines included nested cross-validation, missing data imputation, and late integration.ResultsWe significantly classified patients from controls with a balanced accuracy of 64.2% (95% CI = [51.7, 76.7]) for the single algorithm approach and 63.1% (95% CI = [50.4, 75.8]) for the ensemble approach. Post hoc analyses showed that the classification primarily was driven by the cognitive data. Neither approach predicted short- and long-term clinical response. To validate our methodological framework based on simulated data, we selected the best, a medium, and the most poorly performing algorithm on the simulated data and applied them to the real data. We found that the ranking of the algorithms was kept in the real data.DiscussionOur rigorous modelling framework incorporating simulated data and parallel pipelines discriminated patients from controls, but our extensive, multimodal neuropsychiatric data from antipsychotic-naïve schizophrenia patients were not predictive of the clinical outcome. Nevertheless, our novel approach holds promise as an important step to obtain reliable, unbiased results with modest sample sizes when independent replication samples are not available.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.