Abstract

Diagnosing mental disorders is complex due to the genetic, environmental and psychological contributors and the individual risk factors. Language markers for mental disorders can help to diagnose a person. Research thus far on language markers and the associated mental disorders has been done mainly with the Linguistic Inquiry and Word Count (LIWC) program. In order to improve on this research, we employed a range of Natural Language Processing (NLP) techniques using LIWC, spaCy, fastText and RobBERT to analyse Dutch psychiatric interview transcriptions with both rule-based and vector-based approaches. Our primary objective was to predict whether a patient had been diagnosed with a mental disorder, and if so, the specific mental disorder type. Furthermore, the second goal of this research was to find out which words are language markers for which mental disorder. LIWC in combination with the random forest classification algorithm performed best in predicting whether a person had a mental disorder or not (accuracy: 0.952; Cohen’s kappa: 0.889). SpaCy in combination with random forest predicted best which particular mental disorder a patient had been diagnosed with (accuracy: 0.429; Cohen’s kappa: 0.304).

Highlights

  • Mental disorders make up a major portion of the global burden of disease [1], and in 2017, 10.7% of the global population reported having or having had a mental disorder [2]

  • We found that people with attention deficit hyperactivity disorder (ADHD) use more third-person plural (3pl) pronouns, less words of relativity [13] and more sentences, but less clauses per sentence [17] than normal

  • We investigated the following four basic approaches in Natural Language Processing (NLP) for identification of language markers: lexical processing from a lexical semantics perspective, dependency parsing from a compositional semantics viewpoint, shallow neural networks in a stochastic paradigm and deep neural networks employing a transformer-based architecture

Read more

Summary

Introduction

Mental disorders make up a major portion of the global burden of disease [1], and in 2017, 10.7% of the global population reported having or having had a mental disorder [2]. The Language Inquiry and Word Count (LIWC) toolkit has been the main focus for identifying language markers [6]. This toolkit of Natural Language Processing (NLP) techniques calculates the number of words of certain categories that are used in a text based on a dictionary [7]. LIWC does not use subsymbolic (i.e., probabilistic and vector-based) NLP techniques such as word vector representations within neural networks

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call