Abstract
BackgroundThe diagnosis of schizophrenia is currently based on anamnesis and psychiatric examination only. Language biomarkers may be useful to provide a quantitative and reproducible risk estimate for this spectrum of disorders. While people with schizophrenia spectrum disorders may show one or more language abnormalities, such as incoherence, affective flattening, failure of reference as well as changes in sentence length and complexity, the clinical picture can vary largely between individuals and language abnormalities will reflect this heterogeneity.Computational linguistics can be used to quantify these features of language. Because of the heterogeneous character of the various symptoms present in schizophrenia spectrum subjects, we expect some subjects to show semantic incoherence, while others may have more affective symptoms such as monotonous speech. Here, we combine phonological, semantic and syntactic features of semi-spontaneous language with machine learning algorithms for classification in order to develop a biomarker sensitive to the broad spectrum of schizophrenia.MethodsSemi-spontaneous natural language samples were collected from 50 subjects with schizophrenia spectrum disorders and 50 age, gender and parental education matched controls, using recorded neutral-topic, open-ended interviews. The audio samples were speaker coded; audio belonging to the subject was extracted and transcribed. Phonological features were extracted using OpenSMILE; semantic features were calculated using a word2vec model using a moving windows of coherence approach, and finally syntactic aspects were calculated using the T-scan tool. Feature reduction was applied to each of the domains. To distinguish groups, results from machine learning classifiers trained using leave-one-out cross-validation on each of these aspects were combined, incorporating a voting mechanism.ResultsThe machine-learning classifier approach obtained 75–78% accuracy for the semantic, syntactic and phonological domains individually. As most distinguishing features of their respective domain, we found reduced timbre and intonation for the phonological domain, increased variance of coherence for the semantic domain and decreased complexity of speech in the syntactic domain. The combined approach, using a voting algorithm across the domains, achieved an accuracy of 83% and a precision score of 89%. No significant differences in age, gender or parental education between healthy controls and subjects with schizophrenia spectrum disorders was found.DiscussionIn this study we demonstrated that computational features derived from different linguistic domains capture aspects of symptomatic language of schizophrenia spectrum disorder subjects. The combination of these features was useful to improve classification for this heterogeneous disorder, as we showed high accuracy and precision from the language parameters in distinguishing schizophrenia patients from healthy controls. These values are better than those obtained with imaging or blood analyses, while language is a more easily obtained and cheaper measure than those derived from other methods. Validation in an independent sample is required, and further features of differentiation should be extracted for their respective domains. Our positive results in using language abnormalities to automatically detect schizophrenia show that computational linguistics is a promising method in the search for reliable markers in psychiatry.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.