Mathematical modeling of language in Artificial Intelligence is of the utmost importance for many research areas and technological applications. Over the last decade, research on text representation has been directed towards the investigation of dense vectors popularly known as word embeddings. In this paper, we propose a cognitive-emotional scoring and representation framework for text based on word embeddings. This representation framework aims to mathematically model the emotional content of words in short free-form text messages, produced by adults in follow-up due to any mental health condition in the outpatient facilities within the Psychiatry Department of Hospital Fundación Jiménez Díaz in Madrid, Spain. Our contribution is a geometrical-topological framework for Sentiment Analysis, that includes a hybrid method that uses a cognitively-based lexicon together with word embeddings to generate graded sentiment scores for words, and a new topological method for clustering dense vector representations in high-dimensional spaces, where points are very sparsely distributed. Our framework is useful in detecting word association topics, emotional scoring patterns, and embedded vectors’ geometrical behavior, which might be useful in understanding language use in this kind of texts. Our proposed scoring system and representation framework might be helpful in studying relations between language and behavior and their use might have a predictive potential to prevent suicide.
Read full abstract