Impairments in speech production are a core symptom of non-affective psychosis (NAP). While traditional clinical ratings of patients' speech involve a subjective human factor, modern methods of natural language processing (NLP) promise an automatic and objective way of analyzing patients' speech. This study aimed to validate NLP methods for analyzing speech production in NAP patients. Speech samples from patients with a diagnosis of schizophrenia or schizoaffective disorder were obtained at two measurement points, 6 months apart. Out of N = 71 patients at T1, speech samples were also available for N = 54 patients at T2. Global and local models of semantic coherence as well as different word embeddings (word2vec vs. GloVe) were applied to the transcribed speech samples. They were tested and compared regarding their correlation with clinical ratings and external criteria from cross-sectional and longitudinal measurements. Results did not show differences for global vs. local coherence models and found more significant correlations between word2vec models and clinically relevant outcome variables than for GloVe models. Exploratory analysis of longitudinal data did not yield significant correlation with coherence scores. These results indicate that natural language processing methods need to be critically validated in more studies and carefully selected before clinical application.
Read full abstract