Abstract

AbstractAuthor profiling is the computational task of inferring an author's demographics (e.g., gender, age etc.) based on text samples written by them. As in other text classification tasks, optimal results are usually obtained by using training data taken from the same text genre as the target application, in so‐called in‐domain settings. On the other hand, when training data in the required text genre is unavailable, a possible alternative is to perform cross‐domain author profiling, that is, building a model from a source domain (e.g., Facebook posts), and then using it to classify text in a different target domain (e.g., e‐mails.) Methods of this kind may however suffer from cross‐domain vocabulary discrepancies and other difficulties. As a means to ameliorate these, the present work discusses a particular strategy for cross‐domain author profiling in which multiple source domains are combined in a stack ensemble architecture of pre‐trained language models. Results from this approach are shown to compare favourably against standard single‐source cross‐domain author profiling, and are found to reduce overall accuracy loss in comparison with optimal in‐domain gender and age classification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.