A study of deep learning methods for same-genre and cross-genre author profiling

Muhammad Adnan Ashraf,Feiping Nie,Rao Muhammad Adeel Nawab,Fernando Perez,David Pinto,Vivek Singh

doi:10.3233/jifs-179896

Abstract

The aim of the author profiling task is to automatically predict various traits of an author (e.g. age, gender, etc.) from written text. The problem of author profiling has been mainly treated as a supervised text classification task. Initially, traditional machine learning algorithms were used by the researchers to address the problem of author profiling. However, in recent years, deep learning has emerged as a state-of-the-art method for a range of classification problems related to image, audio, video, and text. No previous study has carried out a detailed comparison of deep learning methods to identify which method(s) are most suitable for same-genre and cross-genre author profiling. To fulfill this gap, the main aim of this study is to carry out an in-depth and detailed comparison of state-of-the-art deep learning methods, i.e. CNN, Bi-LSTM, GRU, and CRNN along with proposed ensemble methods, on four PAN Author Profiling corpora. PAN 2015 corpus, PAN 2017 corpus and PAN 2018 Author Profiling corpus were used for same-genre author profiling whereas PAN 2016 Author Profiling corpus was used for cross-genre author profiling. Our extensive experimentation showed that for same-genre author profiling, our proposed ensemble methods produced best results for gender identification task whereas CNN model performed best for age identification task. For cross-genre author profiling, the GRU model outperformed all other approaches for both age and gender.

Full Text