Abnormal emotion processing is a core feature of schizophrenia spectrum disorders (SSDs) that encompasses multiple operations. While deficits in some areas have been well-characterized, we understand less about abnormalities in the emotion processing that happens through language, which is highly relevant for social life. Here, we introduce a novel method using deep learning to estimate emotion processing rapidly from spoken language, testing this approach in male-identified patients with SSDs (n = 37) and healthy controls (n = 51). Using free responses to evocative stimuli, we derived a measure of appropriateness, or “emotional alignment” (EA). We examined psychometric characteristics of EA and its sensitivity to a single-dose challenge of oxytocin, a neuropeptide shown to enhance the salience of socioemotional information in SSDs. Patients showed impaired EA relative to controls, and impairment correlated with poorer social cognitive skill and more severe motivation and pleasure deficits. Adding EA to a logistic regression model with language-based measures of formal thought disorder (FTD) improved classification of patients versus controls. Lastly, oxytocin administration improved EA but not FTD among patients. While additional validation work is needed, these initial results suggest that an automated assay using spoken language may be a promising approach to assess emotion processing in SSDs.