Abstract

Cross-genre author profiling aims to build generalized models for predicting profile traits of authors that can be helpful across different text genres for computer forensics, marketing, and other applications. The cross-genre author profiling task becomes challenging when dealing with low-resourced languages due to the lack of availability of standard corpora and methods. The task becomes even more challenging when the data is code-switched, which is informal and unstructured. In previous studies, the problem of cross-genre author profiling has been mainly explored for mono-lingual texts in highly resourced languages (English, Spanish, etc.). However, it has not been thoroughly explored for the code-switched text which is widely used for communication over social media. To fulfill this gap, we propose a transfer learning-based solution for the cross-genre author profiling task on code-switched (English–RomanUrdu) text using three widely known genres, Facebook comments/posts, Tweets, and SMS messages. In this article, firstly, we experimented with the traditional machine learning, deep learning and pre-trained transfer learning models (MBERT, XLMRoBERTa, ULMFiT, and XLNET) for the same-genre and cross-genre gender identification task. We then propose a novel Trans-Switch approach that focuses on the code-switching nature of the text and trains on specialized language models. In addition, we developed three RomanUrdu to English translated corpora to study the impact of translation on author profiling tasks. The results show that the proposed Trans-Switch model outperforms the baseline deep learning and pre-trained transfer learning models for cross-genre author profiling task on code-switched text. Further, the experimentation also shows that the translation of RomanUrdu text does not improve results.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.