Abstract

Gender profiling of unstructured text data has several applications in areas such as marketing, advertising, legal investigation, and recommender systems. The automatic detection of gender in microblogs, like twitter, is a difficult task. It requires a system that can use knowledge to interpret the linguistic styles being used by the genders. In this paper, we try to provide this knowledge for such a system by considering different sets of features, which are relatively independent of the text, such as function words and part of speech n-grams. We test a range of different feature sets using two different classifiers; namely Naive Bayes and maximum entropy algorithms. Our results show that the gender detection task benefits from the inclusion of features that capture the authorial style of the microblog authors. We achieve an accuracy of approximately 71 %, which outperforms the classification accuracy of commercially available gender detection software like Gender Genie and Gender Guesser.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.