Abstract
Statistical inference is essential for science since the twentieth century (Salsburg, 2001). Since it's introduction into science, the null hypothesis significance testing (NHST), in which the P-value serves as the index of “statistically significant,” is the most widely used statistical method in psychology (Sterling et al., 1995; Cumming et al., 2007), as well as other fields (Wasserstein and Lazar, 2016). However, surveys consistently showed that researchers in psychology may not able to interpret P-value and related statistical procedures correctly (Oakes, 1986; Haller and Krauss, 2002; Hoekstra et al., 2014; Badenes-Ribera et al., 2016). Even worse, these misinterpretations of P-value may cause the abuse of P-value, for example, P-hacking (Simmons et al., 2011; John et al., 2012). To counter these misinterpretations and abuse of P-values, researchers have proposed many solutions. For example, complementing NHST with estimation-based statistics (Wilkinson and the Task Force on Statistical Inference, 1999; Cumming, 2014), lower the threshold for “significance” (Benjamin et al., 2017) or totally banning the use of NHST and related procedures (Trafimow and Marks, 2015) and using Bayes Factor (Wagenmakers et al., 2011, 2017). Of all these solutions, the estimation-based statistics was adopted by several mainstream psychological journals. One reason is that confidence intervals (CIs) of the estimation-based statistics help better statistical inference (though not guarantee it) (Coulson et al., 2010). However, the first step of changing is to know to what extent people in the field misinterpreting these statistical indices and how the misinterpretations caused abuse of these statistical procedures in research. Here we introduce the raw data available for anyone who is interested in examining how students and researchers misinterpret of P-value and CIs, as well as how NHST and CIs influence the interpretation of research results. Part of the results had been reported in our previous Chinese paper (Hu et al., 2016).
Highlights
Statistical inference is essential for science since the twentieth century (Salsburg, 2001)
Since it’s introduction into science, the null hypothesis significance testing (NHST), in which the P-value serves as the index of “statistically significant,” is the most widely used statistical method in psychology (Sterling et al, 1995; Cumming et al, 2007), as well as other fields (Wasserstein and Lazar, 2016)
Surveys consistently showed that researchers in psychology may not able to interpret P-value and related statistical procedures correctly (Oakes, 1986; Haller and Krauss, 2002; Hoekstra et al, 2014; Badenes-Ribera et al, 2016)
Summary
Statistical inference is essential for science since the twentieth century (Salsburg, 2001). Surveys consistently showed that researchers in psychology may not able to interpret P-value and related statistical procedures correctly (Oakes, 1986; Haller and Krauss, 2002; Hoekstra et al, 2014; Badenes-Ribera et al, 2016) Even worse, these misinterpretations of P-value may cause the abuse of P-value, for example, P-hacking (Simmons et al, 2011; John et al, 2012). Complementing NHST with estimation-based statistics (Wilkinson and the Task Force on Statistical Inference, 1999; Cumming, 2014), lower the threshold for “significance” (Benjamin et al, 2017) or totally banning the use of NHST and related procedures (Trafimow and Marks, 2015) and using Bayes Factor (Wagenmakers et al, 2011, 2017). Part of the results had been reported in our previous Chinese paper (Hu et al, 2016)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have