T59. ACOUSTIC SPEECH MARKERS FOR SCHIZOPHRENIA

Janna De Boer,Alban Voppel,Iris Sommer,Frank Wijnen

doi:10.1093/schbul/sbaa029.619

Abstract

BackgroundClinicians routinely use impressions of speech as an element of mental status examination, including ‘pressured’ speech in mania and ‘monotone’ or ‘soft’ speech in depression or psychosis. In psychosis in particular, descriptions of speech are used to monitor (negative) symptom severity. Recent advances in computational linguistics have paved the way towards automated speech analyses as a biomarker for psychosis. In the present study, we assessed the diagnostic value of acoustic speech features in schizophrenia. We hypothesized that a classifier would be highly accurate (~ 80%) in classifying patients and healthy controls.MethodsNatural speech samples were obtained from 86 patients with schizophrenia and 77 age and gender matched healthy controls through a semi-structured interview, using a set of neutral open-ended questions. Symptom severity was rated by consensus rating of two trained researchers, blinded to phonetic analysis, with the Positive And Negative Syndrome Scale (PANSS). Acoustic features were extracted with OpenSMILE, employing the Geneva Acoustic Minimalistic Parameter Set (GeMAPS), which comprises standardized analyses of pitch (F0), formants (F1, F2 and F3, i.e. acoustic resonance frequencies that indicate the position and movement of the articulatory muscles during speech production), speech quality, length of voiced and unvoiced regions. Speech features were fed into a linear kernel support vector machine (SVM) with leave-one-out cross-validation to assess their value for psychosis diagnosis.ResultsDemographic analyses revealed no differences between patients with schizophrenia and healthy controls in age or parental education. An automated machine-learning speech classifier reached an accuracy of 82.8% in classifying patients with schizophrenia and controls on speech features alone. Important features in the model were variation in loudness, spectral slope (i.e. the gradual decay in energy in high frequency speech sounds) and the amount of voiced regions (i.e. segments of the interview where the participant was speaking). PANSS positive, negative and general scores were significantly correlated with pitch, formant frequencies and length of voiced and unvoiced regions.DiscussionThis study demonstrates that an algorithm using quantified features of speech can objectively differentiate patients with schizophrenia from controls with high accuracy. Further validation in an independent sample is required. Employing standardized parameter sets ensures easy replication and comparison of analyses and can be used for cross linguistic studies. Although at an early stage, the field of clinical computational linguistics introduces a powerful tool for diagnosis and prognosis of psychosis and neuropsychiatric disorders in general. We consider this new diagnostic tool to be of high potential given its ease of acquirement, low costs and patient burden. For example, this tool could easily be implemented as a smartphone app to be used in treatment settings.

Full Text