AbstractIn a digital ecosystem where large amounts of data related to user actions are generated every day, important concerns have emerged about the collection, management, and analysis of these data and, according, about user privacy. In recent years, users have been accustomed to organizing in and relying on digital communities to support and achieve their goals. In this context, the present study aims to identify the main privacy concerns in user communities on social media, and how these affect users’ online behavior. In order to better understand online communities in social networks, privacy concerns, and their connection to user behavior, we developed an innovative and original methodology that combines elements of machine learning as a technical contribution. First, a complex network visualization algorithm known as ForceAtlas2 was used through the open-source software Gephi to visually identify the nodes that form the main communities belonging to the sample of UGC collected from Twitter. Then, a sentiment analysis was applied with Textblob, an algorithm that works with machine learning on which experiments were developed with support vector classifier (SVC), multinomial naïve Bayes (MNB), logistic regression (LR), random forest, and classifier (RFC) under the theoretical frameworks of computer-aided text analysis (CATA) and natural language processing (NLP). As a result, a total of 11 user communities were identified: the positive protection software and cybersecurity and eCommerce, the negative privacy settings, personal information and social engineering, and the neutral privacy concerns, hacking, false information, impersonation and cookies data. The paper concludes with a discussion of the results and their relation to user behavior in digital environments and an outline valuable and practical insights into some techniques and challenges related to users’ personal data.
Read full abstract