Background: Computational approaches to measure naturalistic behavior in clinical settings could provide an objective backstop for mental health assessment and disease monitoring, both of which are costly and unreliable using traditional methods. Objective: The objective of this pilot study was to determine which parts of the mental status exam could be reliably predicted by a combination of facial and vocal features extracted from a recorded interview using a combination of computer-assisted methods, in order to assess feasibility of our approach to quantify behavior for a longitudinal study of patients receiving psychiatric treatment. Methods: A total of 18 patients carrying diagnoses of schizophrenia, bipolar disorder, and related conditions were recruited from an inpatient psychiatric unit and participated in a total of 24 semi-structured interviews lasting 5-15 minutes (modeled after clinical rounds). Synchronized audio and video data were acquired from both patient and doctor during each encounter using 1080p webcams focused on the face and upper torso and cardioid headset microphones. Standardized psychiatric symptom scales was obtained after each recorded interview. Behavioral features, including facial action units (AUs), gaze, and speech characteristics (eg, prosody, pitch, tone, texture) were computed automatically using in-house and publicly available software. To predict clinical scales we trained a linear kernel support vector regressor (SVR) using features from both the entire session (ie, global mean) and each experimental epoch (eg, means during time spent alone and each individual question), leading to 15 predictors for each clinical scale item and scale totals. We used leave-one-out validation on the training data (maximizing the Pearson correlation coefficient) to determine the C parameter for the SVR models; for testing, we used leave-one-subject-out cross-validation (ie, leaving 17 participants for training/validation in each fold). Results: Providing evidence of our approach's ability to capture and quantify relevant signal that confirms or verifies clearly visible psychopathology, we found that parameters such as brow furrowing (AU4, R=0.744) and eye widening (AU5, R=–0.601) were correlated with depression measures on the BPRS. In many cases, these effects were specific to the question or experimental epoch. For instance, unusual thought content was most evident in increased frequency of brow flashes (AU2, R=0.752) and greater smile variability (R=0.656) that occurred while participants were alone in the room. Individuals with higher ratings of delusions also showed increased brow flashes in response to a question about their self confidence (R=0.739). Many relationships showed a “dose effect” with midrange scores corresponding with moderate psychopathology. Conclusions: Our experiments show that automatically detected facial action units and speech properties can be used to predict and quantify a number of psychiatric symptoms from multiple domains of psychopathology, including both mood and psychosis. We demonstrate the importance of analyzing behaviors in the appropriate context (ie, while participants are alone or prompted with a specific question) in order to optimally extract clinically relevant information from objective indices of behavior. Thus, quantitative assessment of behavior in naturalistic settings is both feasible and informative as an adjunct to traditional methods of mental status assessment. [iproc 2016;2(1):e44]
Read full abstract