Since the popularization of social media (SM) platforms, researchers have been trying to use their data to predict electoral results. Previous surveys point out that the most used approach is based on volume and sentiment analysis of posts on Twitter. However, they are almost unanimous in presenting that the results are not better than chance. In this context, this study aims to investigate the feasibility of predicting electoral results based only on Twitter, discover the main issues, and draw guidelines for future alternative directions. For this, we reviewed the evolution of election polling and predictions, including the “polling crises” of 1936 and 1948, and their similarities with current approaches. We also built on the official SM platforms' documentation and on our experience collecting and analyzing large-scale data from many SM platforms. Lastly, we analyzed nine reviews on predicting elections with SM data from 2013 to 2021. We observed that, contrary to initial expectations, most of the current research with Twitter has been unable to solve many of the challenges encountered since initial studies, and also shares many of the characteristics of unsuccessful straw polls performed before 1936. We illustrate that by highlighting the impracticability of polling over Twitter due to several biases and technical barriers, the need for external data, the high dependency on the arbitrary decisions of researchers, and the constant change in platforms' scenarios, that may invalidate specific models. Lastly, we indicate some of the possible future directions, such as a focus on creating repeatable processes; the use of SM data as part of statistical models, instead of polling; diversifying the input data sources, including multiple SM platforms and non-SM data such as polls and economic indicators; using machine learning for regression of the vote share, rather than for sentiment analysis; and dealing with the uncertainty of the highly divergent polling results.
Read full abstract