Identifying widely disseminated papers (WDPs) on social media can help to understand dissemination mechanisms of scientific papers from academia to social media and assist in the formulation of public and science policy. This study applies machine learning methods to explore the possibility of identifying WDPs and to investigate the influence mechanisms of literature-related and social media-related features. A pre-task was first conducted to investigate whether the visibility of scientific papers on social media can be predicted, and the role of various features was analyzed. Then, we defined two predictive tasks for identifying WDPs before and after they are visible on social media. The performance of eight state-of-the-art algorithms was compared in three experiments against the dataset of the oncology field, and the contribution of literature-related and social media-related features in the tasks was explained based on the Shapley additional explanations (SHAP) value. The results show that XGBoost performs better than other algorithms, especially with an F1 score of 0.988 and AUC of 0.998 in the trend prediction task. Nearly all of the literature-related features have great effects on identifying long-term disseminated papers, and most social media-related features play more significant roles in identifying broadly mentioned papers. Moreover, journal features contribute more to identifying papers of social media visibility, while paper features, especially research topics, have a greater influence on identifying WDPs. The number and proportion of academic-related Twitter users have great impacts on the scale and duration of papers’ dissemination. The number and duration of first-generation tweets play critical roles in identifying broadly mentioned and long-term disseminated papers, respectively. This study provides profound insights into the influencing factors in the dissemination of papers from the scientific community to and across social media, and helps to understand the difference in knowledge propagation between academia and the public.
Read full abstract