Social networks are a sterling source of information that reflects the real life of people in the digital space. This makes it possible to infer various aspects of the socioeconomic behavior of the user, even if he/she does not indicate them explicitly. In this study, on the one hand, we consider Russian online social network VK.com, which is analog to the global Facebook platform. On the other hand, there is a supplementary financial information source provided by the bank company. Combining the data of online social media with debit card transactions, we train machine learning models to infer the socioeconomic status (SES) of the user, as well as six purchasing patterns that characterize customer transactional activity of certain type. Namely, we detect if a user is a driver, parent, gamer, traveler, or he/she prefers to purchase at night/in the morning. SES is defined as average monthly expenses and considered as real number variable. The following features are extracted as predictors: demographic information from a user’s page, user participation in communities, topics of that communities, text embeddings of user posts, topological characteristics, and graph embeddings of nodes in the friendship graph. Obtained results show the superiority of graph embeddings in both classification and regression tasks (median absolute percentage error MedAPE = 29.7 for SES). Moreover, for drivers (Macro- $$F_1=0.688$$ ) and parents (Macro- $$F_1=0.679$$ ), the higher scores are reached by concatenation of different features. In addition, we investigate feature importance values and found that topics of user communities and the structure of its network influence on the model stronger than other features. The performed study shows the power of online social media data for inferring user socioeconomic attributes.
Read full abstract