Abstract

In recent years, many studies have used social media data to make estimates of electoral outcomes and public opinion. This paper reports the findings from a meta-analysis examining the predictive power of social media data by focusing on various sources of data and different methods of prediction; i.e., (1) sentiment analysis, and (2) analysis of structural features. Our results, based on the data from 74 published studies, show significant variance in the accuracy of predictions, which were on average behind the established benchmarks in traditional survey research. In terms of the approaches used, the study shows that machine learning-based estimates are generally superior to those derived from pre-existing lexica, and that a combination of structural features and sentiment analyses provides the most accurate predictions. Furthermore, our study shows some differences in the predictive power of social media data across different levels of political democracy and different electoral systems. We also note that since the accuracy of election and public opinion forecasts varies depending on which statistical estimates are used, the scientific community should aim to adopt a more standardized approach to analyzing and reporting social media data-derived predictions in the future.

Highlights

  • Scholars have suggested that the social sciences are experiencing a momentous shift from data scarcity to data abundance [1,2], noting, that full access to data is likely to be reserved for the most powerful corporate, governmental, and academic institutions [3]

  • Since each study may test more than one prediction, we ended up with 460 estimates in total—232 mean average error (MAE) or other convertible forms (RMSE, absolute error, etc.), 205 R squared or coefficients, and 23 estimate reported race-based accuracy

  • This study is one of the first systematic reviews of social media-based predictions, it is important to note some of its limitations. We have included both electoral outcomes and traditional polls as the benchmarks for comparison. This follows the assumption and general wisdom from many of the studies we examined, wherein traditional polls were expected to be more sensitive to public opinion than social media trends, which are often noisy and easy to manipulate

Read more

Summary

Introduction

Scholars have suggested that the social sciences are experiencing a momentous shift from data scarcity to data abundance [1,2], noting, that full access to data is likely to be reserved for the most powerful corporate, governmental, and academic institutions [3]. Many recent studies have utilized social media data as a “social sensor” aiming to predict different economic, social, and political phenomena, ranging from election results to the box office success of Hollywood movies—with some achieving considerable success in this endeavor. Still, most of these studies have been primarily data-driven and largely atheoretical. Ceron et al noted that the predictive capacity of social media analysis does not necessarily rely on user representativeness, suggesting that “if we assume that the politically active internet users act like opinion-makers who are able to influence (or to ‘anticipate’) the preference of a wider audience: it would be found that the preferences expressed through social media today would affect (predict) the opinion of the entire population tomorrow” [16] Discounting social media data as being invalid due to its inability to represent a population misses important dynamics that might make the data useful for opinion mining

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.