Abstract

BackgroundIn this age of social media, any news—good or bad—has the potential to spread in unpredictable ways. Changes in public sentiment have the potential to either drive or limit investment in publicly funded activities, such as scientific research. As a result, understanding the ways in which reported cases of scientific misconduct shape public sentiment is becoming increasingly essential—for researchers and institutions, as well as for policy makers and funders. In this study, we thus set out to assess and define the patterns according to which public sentiment may change in response to reported cases of scientific misconduct. This study focuses on the public response to the events involved in a recent case of major scientific misconduct that occurred in 2014 in Japan—stimulus-triggered acquisition of pluripotency (STAP) cell case.ObjectivesThe aims of this study were to determine (1) the patterns according to which public sentiment changes in response to scientific misconduct; (2) whether such measures vary significantly, coincident with major timeline events; and (3) whether the changes observed mirror the response patterns reported in the literature with respect to other classes of events, such as entertainment news and disaster reports.MethodsThe recent STAP cell scandal is used as a test case. Changes in the volume and polarity of discussion were assessed using a sampling of case-related Twitter data, published between January 28, 2014 and March 15, 2015. Rapidminer was used for text processing and the popular bag-of-words algorithm, SentiWordNet, was used in Rapidminer to calculate sentiment for each sample Tweet. Relative volume and sentiment was then assessed overall, month-to-month, and with respect to individual entities.ResultsDespite the ostensibly negative subject, average sentiment over the observed period tended to be neutral (−0.04); however, a notable downward trend (y=−0.01 x +0.09; R ²=.45) was observed month-to-month. Notably polarized tweets accounted for less than one-third of sampled discussion: 17.49% (1656/9467) negative and 12.59% positive (1192/9467). Significant polarization was found in only 4 out of the 15 months covered, with significant variation month-to-month (P<.001). Significant increases in polarization tended to coincide with increased discussion volume surrounding major events (P<.001).ConclusionsThese results suggest that public opinion toward scientific research may be subject to the same sensationalist dynamics driving public opinion in other, consumer-oriented topics. The patterns in public response observed here, with respect to the STAP cell case, were found to be consistent with those observed in the literature with respect to other classes of news-worthy events on Twitter. Discussion was found to become strongly polarized only during times of increased public attention, and such increases tended to be driven primarily by negative reporting and reactionary commentary.

Highlights

  • Social media are increasingly being used to make inferences about the real world, with application to politics (O’Connor et al 2010), health (Dredze 2012), and marketing (Gopinath, Thomas, and Krishnamurthi 2014)

  • Can the demographics of a set of Twitter users be inferred from network information alone? We find across six demographic variables an average held-out correlation of .77 between the web traffic demographics of a website and that predicted by a regression model based on the site’s Twitter followers

  • We can see that overall the correlation is very strong: .77 on average, ranging from .55 for the 35-44 age bracket to .89 for Male and African American. All of these correlation coefficients are significant using a two-tailed t-test (p < 0.01), with a Bonferroni adjustment for the 19 comparisons. These results indicate that the neighbor vector provides a reliable signal of the demographics of a group of Twitter users

Read more

Summary

Introduction

Social media are increasingly being used to make inferences about the real world, with application to politics (O’Connor et al 2010), health (Dredze 2012), and marketing (Gopinath, Thomas, and Krishnamurthi 2014). A common approach to demographic inference is supervised classification — from a training set of annotated users, a model is fit to predict user attributes from the content of their writings (Argamon et al 2005; Schler et al 2006; Rao et al 2010; Pennacchiotti and Popescu 2011; Burger et al 2011; Rao et al 2011; Al Zamal, Liu, and Ruths 2012) This approach has a number of limitations: collecting human annotations is costly and error-prone; many demographic variables of interest cannot be labeled by inspecting. In the remainder of the paper, we will first review related work, describe the data collected from Twitter and QuantCast and the feature representation used for the task; we will present regression and classification results; we will conclude and outline directions for future work.

Related Work
Quantcast
Regression
Classification
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call