Gender classification of microblog text based on authorial style

Shubhadeep Mukherjee,Pradip Kumar Bala

doi:10.1007/s10257-016-0312-0

Abstract

Gender profiling of unstructured text data has several applications in areas such as marketing, advertising, legal investigation, and recommender systems. The automatic detection of gender in microblogs, like twitter, is a difficult task. It requires a system that can use knowledge to interpret the linguistic styles being used by the genders. In this paper, we try to provide this knowledge for such a system by considering different sets of features, which are relatively independent of the text, such as function words and part of speech n-grams. We test a range of different feature sets using two different classifiers; namely Naive Bayes and maximum entropy algorithms. Our results show that the gender detection task benefits from the inclusion of features that capture the authorial style of the microblog authors. We achieve an accuracy of approximately 71 %, which outperforms the classification accuracy of commercially available gender detection software like Gender Genie and Gender Guesser.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Gender classification of microblog text based on authorial style

Abstract

Talk to us

Similar Papers

More From: Information Systems and e-Business Management

Lead the way for us

Journal: Information Systems and e-Business Management	Publication Date: Mar 2, 2016
Citations: 28

Similar Papers

Sarcasm detection in microblogs using Naïve Bayes and fuzzy clustering
Shubhadeep Mukherjee ... Pradip Kumar Bala
Technology in Society | VOL. 48
Shubhadeep Mukherjee, et. al.Shubhadeep Mukherjee ... Pradip Kumar Bala
04 Nov 2016
Technology in Society | VOL. 48

Outlier detection using flexible categorization and interrogative agendas
Marcel Boersma ... Nachoem Wijnberg
Decision Support Systems | VOL. 180
Marcel Boersma, et. al.Marcel Boersma ... Nachoem Wijnberg
19 Feb 2024
Decision Support Systems | VOL. 180

Learning to Recommend Related Entities With Serendipity for Web Search Users
Jizhou Huang ... Haifeng Wang
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 17
Jizhou Huang, et. al.Jizhou Huang ... Haifeng Wang
23 Apr 2018
ACM Transactions on Asian and Low-Resource Language Information Processing | VOL. 17

An algebraic approach to feature interactions
R.R Karinthi ... D Nau
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 14
R.R Karinthi, et. al.R.R Karinthi ... D Nau
01 Apr 1992
IEEE Transactions on Pattern Analysis and Machine Intelligence | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Gender classification of microblog text based on authorial style

Abstract

Talk to us

Similar Papers

More From: Information Systems and e-Business Management