Author profiling is a challenging task that consists of identifying a person’s relevant attributes based on the content he/she generates. In this article, we validate a classification method based on the reconstructive classification to identify two demographic attributes, age and gender, for users of social networks based on the text content they publish. For the problem, we consider balanced and unbalanced data, where a gender or age group has a larger presence than the others. The proposed method is based on using the reconstructive property of singular value decomposition (SVD), and its suitability to work with sparse data, to find a matrix with the main latent components to represent the information of specific gender or age classes. Afterward, we use such a matrix to project and reconstruct new users’ information to identify their demographic variables. We test our method with a set of datasets in several languages collected from Twitter and Pinterest users, and we use different evaluation metrics to compare their performance with the ones of several popular classifiers based on traditional machine learning, and on deep learning, and with some relevant works in the literature. The results show that the proposed classifier performs generally well in identifying the age and gender of users in social networks.
Read full abstract