MSAPersonality: a modern standard Arabic dataset for personality recognition

Khaoula Chraibi,Younes Dhassi,Ilham Chaker,Azeddine Zahi

doi:10.11591/ijece.v14i4.pp4498-4507

Abstract

Automatic personality recognition is a task that attempts to automatically infer personality traits from a variety of data sources, including Text. Our words, whether spoken or written, reveal a lot about who we are. As people speak different languages, each with its own set of characteristics and level of complexity, identifying their personalities automatically might be language-dependent. This task requires an annotated text corpus with personality traits. However, the lack of corpora for languages other than English makes the task extremely challenging. We concentrated our efforts in this paper on the Arabic language in particular because it is understudied and lacks a corpus, despite being one of the most widely spoken languages in the world. Our primary goal was constructing our “MSAPersonality” dataset, which consists of 267 texts in modern standard Arabic that have been annotated with the Big Five personality traits. To evaluate the dataset and its potential for classification and regression, we used text preprocessing techniques, feature extraction, and machine learning algorithms. We obtained promising experimental results. Therefore, further research into predicting personality from Arabic text can be conducted.

Full Text