Social network data, such as Twitter/X, is of Big Social Data type. Big social data describes people's social behaviors and interactions. They have high business value for decision-making in organizations. However, because of the anonymous nature of social network users, their credibility is ambiguous. Credibility expresses the accuracy and value of big social data. Despite extensive research on the credibility of big social data, most methods have not paid sufficient attention to the important dimensions of their assessment, including user expertise based on topic, selecting social network features, and labeling them. Furthermore, these methods cannot manage the time, high volume, and speed of big social data. To address these issues, this paper presents a novel model for assessing the credibility of Twitter/X users by integrating Twitter/X with Google Scholar. The model automatically defines users' credibility labels using Google Scholar. Machine learning feature selection methods also select features that affect the credibility of Twitter/X users based on the topic. This study uses Google Scholar and the BerTopic algorithm for effective topic modeling on Twitter/X. The model considers unrelated data management, dynamic user credibility, and organizing activities based on the Big Data lifecycle. Finally, using Linear Regression, Support Vector Regression, K-Nearest Neighbor, Random Forest, Classification and Regression Trees algorithms, the model predicts the credibility of Twitter/X users and proves that it performed better than similar models through Classification and Regression Trees. In addition, the model is generalizable for all organizational purposes due to the integration of heterogeneous resources and feature selection methods.
Read full abstract