Abstract

Modern community Question Answering (cQA) platforms encourage their members to publish any sort of question, which later on can get numerous answers from other community peers. However, in this dynamic, there is an intrinsic delay from the moment questions are posted until the arrival of acceptable and/or diverse responses. Therefore, cQA platforms have the pressing need for promoting unresolved questions to potential answerers, while also reducing gender disparity across their topics, for example. Needless to say, demographic analysis occupies a crucial role in successfully responding to these challenges.Nonetheless, there are only a handful of studies dissecting automatic gender recognition across cQA fellows. As far as we know, this work is the first effort to tease out the contribution to this task of the different kinds of textual inputs contained in their profiles (i.e., question titles and bodies, answers and self-descriptions). With this goal, we compare three different types of machine learning approaches under several combinations of these four input signals: traditional neural networks (e.g., RCNN and CNN), fine-tuned pre-trained transformers (e.g., BERT and RoBERTa) and statistical methods enriched with hand-crafted linguistic features (e.g., Bayes and MaxEnt).In a nutshell, our results show that pre-trained transformers are superior when dealing with full questions, conventional neural networks when mixing diverse text signals, and statistical methods when the dataset encompasses mostly noisy user-generated content, namely answers. In addition, our in-depth analysis reveals that dependency parsing is instrumental in designing hand-crafted features capable of modelling topic information, and that both genders are conspicuously represented by some specific topic distributions.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.