Abstract
Diagnosing mental disorders is complex due to the genetic, environmental and psychological contributors and the individual risk factors. Language markers for mental disorders can help to diagnose a person. Research thus far on language markers and the associated mental disorders has been done mainly with the Linguistic Inquiry and Word Count (LIWC) program. In order to improve on this research, we employed a range of Natural Language Processing (NLP) techniques using LIWC, spaCy, fastText and RobBERT to analyse Dutch psychiatric interview transcriptions with both rule-based and vector-based approaches. Our primary objective was to predict whether a patient had been diagnosed with a mental disorder, and if so, the specific mental disorder type. Furthermore, the second goal of this research was to find out which words are language markers for which mental disorder. LIWC in combination with the random forest classification algorithm performed best in predicting whether a person had a mental disorder or not (accuracy: 0.952; Cohen’s kappa: 0.889). SpaCy in combination with random forest predicted best which particular mental disorder a patient had been diagnosed with (accuracy: 0.429; Cohen’s kappa: 0.304).
Highlights
Mental disorders make up a major portion of the global burden of disease [1], and in 2017, 10.7% of the global population reported having or having had a mental disorder [2]
We found that people with attention deficit hyperactivity disorder (ADHD) use more third-person plural (3pl) pronouns, less words of relativity [13] and more sentences, but less clauses per sentence [17] than normal
We investigated the following four basic approaches in Natural Language Processing (NLP) for identification of language markers: lexical processing from a lexical semantics perspective, dependency parsing from a compositional semantics viewpoint, shallow neural networks in a stochastic paradigm and deep neural networks employing a transformer-based architecture
Summary
Mental disorders make up a major portion of the global burden of disease [1], and in 2017, 10.7% of the global population reported having or having had a mental disorder [2]. The Language Inquiry and Word Count (LIWC) toolkit has been the main focus for identifying language markers [6]. This toolkit of Natural Language Processing (NLP) techniques calculates the number of words of certain categories that are used in a text based on a dictionary [7]. LIWC does not use subsymbolic (i.e., probabilistic and vector-based) NLP techniques such as word vector representations within neural networks
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have