Abstract
Based on the statistical features, short text messages published by different gender users are different in terms of the words and semantics used. In this paper, two new features are constructed after constructing a gender-specific thesaurus. A new classification model is constructed by combining the traditional statistical features and the improved text implicitness feature. The experimental evaluation performed on the Sina Weibo dataset demonstrated the effectiveness of gender-specific thesaurus-based features, and the improved text implicitness feature improved the accuracy of gender classification to 84.7%.
Highlights
With the popularization and rapid development of the Internet, social networks are favored and sought after by many Internet users due to their unique virtuality, diversity, innovation, freedom and alienation
Traditional feature In addition to the features based on the construction of the gender-specific thesaurus and the improved semantic and text implicitness features proposed in this paper, we need to incorporate some traditional statistical features to construct the feature vectors
Feature validity verification Compared with the traditional statistics-based gender identification methods, this paper introduces an improved text implicitness feature and two features based on the construction of a gender-specific thesaurus
Summary
With the popularization and rapid development of the Internet, social networks are favored and sought after by many Internet users due to their unique virtuality, diversity, innovation, freedom and alienation. In the research of Chinese gender classifications, Liu and Niu (2016) proposed a gender identification method based on the feature extraction of emotional words and emotionrelated language style. Chinese related dictionary material is still lacking, so the focus of this article is on the construction of gender-specific thesaurus and classify users by machine learning based on the built dictionary and related features extracted from Weibo.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.