A Battle in the Statistics Wars: a simulation-based comparison of Bayesian, Frequentist and Williamsonian methodologies

Mantas Radzvilas,William Peden,Francesco De Pretis

doi:10.1007/s11229-021-03395-y

Abstract

The debates between Bayesian, frequentist, and other methodologies of statistics have tended to focus on conceptual justifications, sociological arguments, or mathematical proofs of their long run properties. Both Bayesian statistics and frequentist (“classical”) statistics have strong cases on these grounds. In this article, we instead approach the debates in the “Statistics Wars” from a largely unexplored angle: simulations of different methodologies’ performance in the short to medium run. We used Big Data methods to conduct a large number of simulations using a straightforward decision problem based around tossing a coin with unknown bias and then placing bets. In this simulation, we programmed four players, inspired by Bayesian statistics, frequentist statistics, Jon Williamson’s version of Objective Bayesianism, and a player who simply extrapolates from observed frequencies to general frequencies. The last player served a benchmark function: any worthwhile statistical methodology should at least match the performance of simplistic induction. We focused on the performance of these methodologies in guiding the players towards good decisions. Unlike an earlier simulation study of this type, we found no systematic difference in performance between the Bayesian and frequentist players, provided the Bayesian used a flat prior and the frequentist used a low confidence level. The Williamsonian player was also able to perform well given a low confidence level. However, the frequentist and Williamsonian players performed poorly with high confidence levels, while the Bayesian was surprisingly harmed by biased priors. Our study indicates that all three methodologies should be taken seriously by philosophers and practitioners of statistics.

Highlights

If there is any suspicion that philosophy of science is an ivory tower subject, it should be extinguished by what Deborah Mayo has called the “Statistics Wars” between classical statisticians, Bayesians, and a prismatic assortment of variations of these views (Ioannidis, 2005; Howson and Urbach, 2006; Wasserstein and Lazar, 2016; Mayo, 2018; van Dongen et al, 2019; Sprenger and Hartmann, 2019; Romero and Sprenger, 2020)
We found that Bayesianism, frequentism, and Williamsonianism can all perform well with suitable player settings
We programmed four different “players” using belief models and choice models. These players were inspired by the three positions in the Statistics Wars that we identified above, plus a benchmark player

Summary

Introduction

If there is any suspicion that philosophy of science is an ivory tower subject, it should be extinguished by what Deborah Mayo has called the “Statistics Wars” between classical statisticians, Bayesians, and a prismatic assortment of variations of these views (Ioannidis, 2005; Howson and Urbach, 2006; Wasserstein and Lazar, 2016; Mayo, 2018; van Dongen et al, 2019; Sprenger and Hartmann, 2019; Romero and Sprenger, 2020). Even apparently recondite questions about concepts like evidence, probability, and rational belief are connected with questions of statistical practice These questions have been given particular salience by the “replication crisis”, in which the rates of replication in published statistical research across a range of scientific fields are apparently below what would be expected from random variation alone (Gelman, 2015; Open Science Collaboration, 2015; Smaldino and McElreath, 2016; Fidler and Wilcox, 2018; Trafimow, 2018). Far from being dry debates, the Statistics Wars are frequently characterised by the sort of aggressive rhetoric, bombastic manifestos, and political maneuvering that their name would suggest. This war-like atmosphere is understandable, because the Wars affect statistical practice, and thereby the health, wealth, and happiness of nations

Methods

Results

Discussion

Conclusion