The Poverty of Moral Foundation Messaging

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

ABSTRACT Prominent scholars have argued that reframing political positions and issues in terms of moral foundations that appeal to conservatives or liberals can attract more individual-level support for those positions, even when such support would be unexpected. Such a change of attitude would be promising for those looking to narrow ideological divides. As two independent research teams, we set out to explore the promising evidence along these lines and to identify further nuances in the arguments. We tested moral reframing in the context of five issue areas: the environment, tax policy, immigration, English as an official language, and universal healthcare. We consistently found null results, not just for overall samples but for sub-groups that have been hypothesized as the most likely to be affected by moral reframing. Using Bayes factors, the observed data are 100 times more likely to occur under the null hypothesis (of no effect) than the moral foundations hypothesis. We explored the reasons for these null results by examining possibilities such as misunderstanding the treatment, the negative or positive valence of the treatment, and small sample sizes. We found no plausible explanation for the absence of treatment effect. Moral reframing techniques may be less helpful to persuasion than previous research suggests.

Similar Papers
  • Front Matter
  • Cite Count Icon 10
  • 10.1016/j.pmrj.2013.05.003
Interpreting “Null” Results
  • Jun 1, 2013
  • PM&R
  • Kristin Sainani

Interpreting “Null” Results

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.jsxm.2022.02.007
Lack of Evidence for a Relationship Between Salivary CRP and Women's Sexual Desire: An Investigation Across Clinical and Healthy Samples
  • Mar 13, 2022
  • The Journal of Sexual Medicine
  • Kirstin Clephane + 8 more

Lack of Evidence for a Relationship Between Salivary CRP and Women's Sexual Desire: An Investigation Across Clinical and Healthy Samples

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.bjae.2019.03.006
Hypothesis tests
  • May 14, 2019
  • BJA Education
  • J Walker

Hypothesis tests

  • Research Article
  • 10.1002/tea.20448
Comment: What constitutes evidence in science education research?
  • Nov 3, 2011
  • Journal of Research in Science Teaching
  • Wolff‐Michael Roth

In the wake of an increasing political commitment to evidence-based decision making and evidence-based educational reform that emerged with the No Child Left Behind effort, the question of what counts as evidence has become increasingly important in the field of science education. In current public discussions, academics, politicians, and other stakeholders tend to privilege experimental studies and studies using statistics and large sample sizes. However, some science education studies use a lot of statistics and large sample sizes (e.g., Bodzin, 2011) and yet, as I suggest in this text, are flawed and do not provide (sound) evidence in favor of some treatment or claim. Leaving aside the assertion and consensus of researchers across the quantitative/qualitative spectrum (e.g., the collection of chapters in Ercikan & Roth, 2009), we must ask whether all studies that appear to provide “quantitative” support for a particular effect do in fact provide substantial or strong evidence. As an anonymous reviewer of this contribution has pointed out, the question in its title really has two dimensions: (a) What constitutes valid evidence and (b) what are the limits of the claims that can be constructed when the evidentiary chain from premises to results is perfectly constructed. Both are important in constructing explanations for phenomena of interest to scientists generally and to science educators in particular. I begin by discussing the two issues in the context of the logic of scientific inquiry and statistical inference and then exemplify the issues as these play out in one recent article published in the pages of this journal (Bodzin, 2011). To further concretize my discussion, I also sketch two re-analyses concerned with the weight of the evidence provided by (a) 10 studies of paranormal psychological phenomena (psi) and (b) 855 studies in experimental psychology. First, in the logic of science, all explanatory schemas—including those of historical, historical-developmental, or interpretive nature—can be expressed in the following way (e.g., Stegmüller, 1974). Some observed event E (i.e., the evidence) is related to the statements about antecedent conditions and general laws or law-like regularities; together these constitute the premises of the argument made in the research article. The conditions for an explanation to be valid include: (a) the argument that leads from a hypothesized regularity or law to observation has to be correct; (b) there has to be at least one general law or law-like regularity; (c) the hypothesized law/regularity has to include empirical content; and (d) the statements that constitute the law have to be true (based on basic logic, no valid inferences can be made otherwise). In the logic of experimental research, explanations may be of two kind: (a) given the same set of antecedent conditions, a first hypothesized law would lead to observed event E1 whereas a second hypothesized law leads to event E2; or (b) given the same law or law-like regularity, the antecedent conditions would lead to observed E1 whereas a second set of antecedent conditions would lead to E2. Frequency-based statistics are used to establish the probability for an event E to be observed p(E|H0) given the null hypothesis.1 This probability gives only indirect evidence, as the researcher has to choose a certain level at which H0 is rejected. In the social sciences, this probability tends to be α = 0.05.2 The point of scientific research generally is to eliminate alternative hypotheses or theories so that the remaining one(s) constitutes the best available at the time.3 A researcher's claim that some observed event E (i.e., the data collected, which constitutes the evidence) is due to specified antecedents and laws/law-like regularities (a) is strong when there are no other explanations but (b) is less strong, weak, or invalid when there are other explanations. A researcher has conducted many studies or tests within the same report. If s/he based the claims deriving from each study/test on the accepted rule of using a type I error rate of α = 0.05 (also false positive or rejecting true null hypothesis H0), then the accumulation of tests leads to higher possibility to have made at least one type I error. Although an experiment suggests that there is a statistically reliable effect with, for example, a probability p < 0.001, the size of the effect may be negligible in practice and therefore not useful for policy makers. Related to the preceding point, a science educator looking at the PISA 2009 (OECD, 2010) scores would notice that there was a statistically significant difference between boys (XB = 509) and girls (XG = 495) on the science scale (SDPOOL = 98). This would be taken as evidence for the claim “U.S. boys outperformed their female counterparts.” Yet looking at the associated distributions of scores (Figure 1), we immediately realize that there are many girls that outperform boys. There is a large overlap of treated and untreated individuals so that for any given score, the likelihood that the person received treatment is similar to that s/he has not received treatment. Researchers do not take into account previous studies of the same design testing the same variables or theories. Because previous knowledge thereby is not taken into account, we cannot evaluate what additional evidence each study provides. Including prior knowledge that is at the heart of Bayesian statistics (e.g., Howson & Urbach, 1989/2006). A plot of two population distributions representing science scores of U.S. boys and girls based on the means and standard deviation reported by the 2009 PISA study (OECD, 2010). There are many other reasons why a valid inference nevertheless is problematic. It is up to the researcher to exhibit and discuss the strength of a study, the validity of its evidence, and which audiences will draw what kind of benefit from the study results. To illustrate how science educators might want to think about the strength of evidence they produce, I provide an exemplary look on a recent study in earth science education (Bodzin, 2011). Bodin suggests that the purpose of the study was to investigate the “extent [that] a [geospatial information technology (GIT)]-supported curriculum could help students at all ability levels … to understand [land use change (LUC)] concepts and enhance the spatial skills involved with aerial and RS imagery interpretation” (Bodzin, 2011, p. 293). That is, an explicit claim is made about a causal relationship between a curriculum and learning. Following the logic of argumentation outlined above, therefore, the author makes a claim that a particular antecedent (the GIT-supported curriculum) brings about a difference in achievement when the students are observed (tested) before and after the treatment. The study uses a simple pre-test (observation O1)/treatment (X)/post-test (O2) design, which has the structure (Cook & Campbell, 1979). In this case, the authors of the standard reference book on quasi-experimental design suggest “we should usually not expect hard-headed causal inferences from the simple before-after design when it is used by itself” (p. 103). Although the authors suggest that such a design may produce hypotheses worthy of further exploration, they express the hope “that persons considering the use of this design will hesitate before resorting to it” (p. 103). This is so because the difference in test scores (O2 − O1) could be due to maturation or other events in the life of the students (e.g., they learn certain mathematical concepts or concepts in logic). Because Bodzin does not rule out other reasonable alternatives, the design provides weak (little) to no evidence for a treatment effect because there are many other possible causes that could have brought about the differences in achievement between the two observations—even though the statistical tests are significant and even if the effect sizes were large. If we accept for the moment that the study is exploratory, we may ask ourselves whether its evidence has any strength that warrants further study. We then have to choose the form of analysis. Traditionally, there is no question: the statistics would be one based on frequency distributions (e.g., Student's t). Within this frequency-based perspective, the evidence provided by Bodzin's study is not strong even though it might appear as such. A first problem with the results is that the reported means are not independent because each overall means reported in each of Bodzin's Tables 2–4 really is a weighted mean derived from the other pieces of information already available. That is, it is as if the author reported that three individuals had $2, $3, and $4, respectively, and also reported that they owned $9 together or that the mean amount was $3. The additional information is redundant rather than additional evidence; but reporting the redundant information makes it look like there is additional evidence. Statisticians tend to deal with this issue by lowering the degrees of freedom and thereby eliminating redundancy. The study therefore violates some basic assumptions for statistical inference that would be part of the second type of validity. Moreover, the overall means in his Table 2 can be calculated from the scales reported in Tables 3 and 4. To draw any useful conclusions from the p values, however, the tests need to be independent. As presented, the study overestimates the evidence in favor of the treatment. A second major problem is the number of t tests conducted: a total of N = 24, which, given the content of bullet 1 above, tremendously increases the possibility of a type I error. That is, the experiment-wise error rate that there is a false positive actually exceeds 1 (24 × α = 24 × 0.05 = 1.2) and, therefore, would be set to p = 1 in statistical packages such as SAS.4 To hold the experiment-wise error rate at α = 0.05, tests could be adjusted using what is known as the Bonferroni procedure (or one of its alternatives).5 In this procedure, every test in an ensemble of N tests is conducted at a revised α-level of αnew = α/N so that the total, experiment-wise error still is less than α = 0.05. That is, instead of a cut-off at p < 0.05, p < 0.01,… the new cut-offs for rejecting the null hypothesis would be at p < 0.0021, p < 0.00042,… and so on. Again, the reported tests are strongly biased in favor of the reported effects because these are conducted at error probabilities 24 times higher than acceptable. Another option would have been to use a MANOVA, that is, a test with multiple (“M-”) dependent measures tested simultaneously (“ANOVA”). Only when this test suggests a significant difference would more conservative, adjusted t-tests be warranted. Even if these problems did not exist, further caution would be required because frequency-based statistics have some fundamental problems, even flaws. In a recent article of the Journal of Personality and Social Psychology, a group of authors reanalyzes the results of a set of experimental studies to respond to their rhetorical question “Why psychologists must change the way they analyze their data?” (Wagemakers, Wetzels, Borsboom, & van der Maas, 2011). This study was designed as a critique of a series of studies on the psychological phenomenon of psi all conducted by the same researcher (Bem, 2011). Here, the “term psi denotes anomalous processes of information or energy transfer that are currently unexplained in terms of known physical or biological mechanisms” (p. 407). It is a descriptive term that includes, among others, telepathy, clairvoyance, precognition, and premonition. The subject is a controversial one that—recognized as such by the study's author—most psychologists right out reject even though there are significant parts of the general population believing in parapsychological phenomena. The series of studies has fulfilled all the criteria required by the logic of science for valid inference. The study suggests that there is overwhelming, cumulative evidence for the existence of certain psi-related phenomena. However, the critique shows that even though there are nine (of 10) experimental studies conducted by Bem with statistically reliable results in favor of rejecting of the null hypothesis (H0)—that is, H1 = there is no psi [precognition]—much of the evidence is only “anecdotal” in favor of either the null hypothesis (there is no psi) or its alternative (there is psi). To provide evidence for their counter claim, Wagemakers, Wetzels et al. (2011) use a simple Bayesian test that uses the unbiased prior possibilities but is not biased against the null hypothesis as is the frequency based statistics Bem, following standard procedure, used in his study.6 Another investigation reanalyzes 855 studies in experimental psychology and suggests that 70% of the studies with 0.01 < p < 0.05 (i.e., a total of 132 studies) provide no more than anecdotal evidence for the effect of interest (Wetzels et al., 2011). That is, in a field that prides itself for the strength of methodological approaches, a large number of studies that appear to support the alternative to the null hypothesis of no (treatment) effect actually provide evidence that is at best anecdotal. Bayesian statistics have been proposed as a way of overcoming many of the inherent problems with frequency-based statistics not in the least because it allows researchers to quantify prior knowledge (e.g., Gurrin, Kurinczuk, & Burton, 2000). Bayesian statistics are of such a nature that they can be used to provide direct and explicit answers to questions that are usually posed by practitioners. This is so because Bayesian statistics asks what the probability p(H0|E) for the null hypothesis H0 given an event E, which simultaneously yields the probability of the alternative hypothesis p(H1|E) = 1 − p(H0|E). That is, Bayesian statistics evaluates the weight of the evidence from a study in support of one or the other hypothesis. An easy-to-use indicator for the strength of a statistical test is the Bayes factor (Rouder, Speckman, Sun, Morey, & Iverson, 2009).7 Its power derives from the fact that it is not biased—as are p, effect sizes, and confidence intervals—in favor of the alternative hypothesis and therefore provides a measure for the quality of the evidence made for or against claims.8 Tables that map calculated Bayes factors to qualitative expressions of the strength of evidence use a scale from “decisive,” “very strong,” “strong,” “substantial,” and “anecdotal” for both the null and alternative hypothesis (Table 1). Thus, a study that is statistically significant nevertheless may provide little more than anecdotal evidence for the hypothesis that there is an effect. If we assumed for the moment that all of Bodzin's tests are independent and calculated the Bayes factor based on the absence of prior knowledge (equal priors for null and alternative hypothesis), we would obtain the results in Table 1. These shows that six of the tests conducted provide only anecdotal evidence in favor of the alternative hypothesis and four tests provide anecdotal evidence in favor of the null hypothesis. As the implementation of one-tailed tests show, the author appeared to have had good reasons to anticipate positive treatment effects. Such prior beliefs may be used to adjust the statistics to account for prior knowledge. As soon as we assume that there is prior knowledge available in favor of larger effect sizes for the treatment, more of the tests become anecdotal evidence against the claims that the treatment applied by Bodzin caused the differences observed. Moreover, if we removed the overall test to avoid statistical dependence as well as the overall tests for each subscale, then there would be only four decisive tests left, three of which on the same (UHI) scale (Table 1)! Apart from one other test, the remaining evidence would be anecdotal only. The upshot of this ever-so-brief analysis is that the evidence in favor of a treatment effect in Bodzin's study is rather weak and at best anecdotal—apart from being subject to the serious threats to the validity of the experiment deriving from the failure to exclude alternative explanations. Even if all this were not problematic, there would still be the question what the study says to science teachers and policy makers, an issue even for the best-constructed studies such as PISA (OECD, 2010). Thus, as Figure 1 shows, because the overlap between the two distributions is so large—that is, within group variation (SD = 98) large compared to between group variation (XBOYS − XGIRLS = 14)—we do not know whether a particular girl or group of girls can be said to be doing better or worse than boys. Similarly if Figure 1 were to express the results of an experimental or quasi-experimental study, we would be unable to say whether a particular girl or group of girls had benefited from the treatment because she/it achieved higher than some boys but lower than other boys (Ercikan & Roth, 2011). Frequency-based statistics therefore come with considerable limitations concerning the weight and interpretability of the evidence collected in a study. As a result, whether frequency-based statistics can provide useful recommendations to practitioners and policy-makers depends on the degree to which study findings apply to the relevant individual or subgroup of individuals. Science educators, as scholars in any other science, ought to strive to provide the strongest forms of evidence for the claims they make. For the evidence to be strong, the design of studies needs to rule out alternative explanations to the largest extent possible. This pertains to single (qualitative) case studies as to high-powered statistical work using the most advanced mathematical modeling techniques and experimental designs. Moreover, because there are many problems with traditional statistics, some substantial, science educators ought to choose the strongest possible statistical methods available to them. In the face of a public debate about evidence-based decision making in educational reform and in the face of efforts to make evidence-based reasoning itself a primary educational goal (e.g., Callan et al., 2009), science educators do not want to be the children left behind. We, science educators, owe it to ourselves to work together (authors, peer reviewers) to produce the strongest possible evidence in the construction of explanations. 1More technically expressed, the p value a study reports is the probability for a certain effect to occur given the null hypothesis. The probabilities are given by the appropriate distribution, including the standard normal (z), Student's t, χ2, and F distribution and correspond to the fraction of the total area under the distribution (i.e., 0.05, 0.01, …) covered by the tail. 2In a historic-genetic form of explanation used by cultural-historical (activity) theorists frequently cited in the science education literature (e.g., Bakhtin, 1981; Leontyev, 1981; Vygotsky, 1927/1997), general laws are inferred from the observed historically (genetically, developmentally) related sequence of events even though there may only be one case. Here, “the challenge is to systematically interrogate the particular case by constituting it as a ‘particular instance of the possible’ … in order to extract general or invariant properties that can be uncovered only by such interrogation” (Bourdieu, 1992, p. 233). 3This is so even in singular cases (e.g., criminologists are faced with the question “who done it?” and need to get the right person even though there are no precedents). 4http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_multtest_sect014.htm 5The procedure is sometimes critiqued for being too conservative. 6A calculator for this statistics is available at http://pcl.missouri.edu/bayesfactor. The website also provides access to relevant articles. 7Technically, the probability for the null hypothesis following the collection of data is given by , where E denotes the data, BF is the Bayes factor, and π0 and π1 are the prior probabilities of H0 and H1, respectively with π1 = (1 − π0) (Gonen, Johnson, Lu, & Westfall, 2005). 8One of the fundamental lesson beginners in statistics learn is that one “cannot prove or provide evidence for the null hypothesis.”

  • Research Article
  • 10.1016/j.ajodo.2015.03.015
Inference from a sample mean--Part 1.
  • Jun 1, 2015
  • American Journal of Orthodontics and Dentofacial Orthopedics
  • Nikolaos Pandis

Inference from a sample mean--Part 1.

  • Research Article
  • 10.54103/2282-0930/29432
Decision Rules in Frequentist and Bayesian Hypothesis Testing: P-Value and Bayes Factor
  • Sep 8, 2025
  • Epidemiology, Biostatistics, and Public Health
  • Giovanni Nicolao + 8 more

Introduction The P-value is a widely used tool in inferential statistics and represents the probability of obtaining a value equal to or more extreme than the one observed, assuming that the null hypothesis (H0) is true [1]. One of its main advantages is its intuitive interpretation: a smaller P-value indicates a lower compatibility of the observed results with the null hypothesis [2]. However, the P-value has important limitations that could lead to significant distortions in the interpretation of the results obtained [3]. The most important limitation is its sensitivity to sample size: as the sample size increases, the power of the test also increases. Consequently, even minor and perhaps clinically irrelevant effects can produce statistically significant P-values, while important effects might not be detected in smaller samples [1]. The use of a fixed significance threshold (typically 0.05) can promote a binary interpretation of the results (significant vs. non-significant), oversimplifying the researcher's decision-making process. This approach risks not fully capturing the degree of statistical evidence, thereby increasing the likelihood of assessment errors [4]. Another limitation is that the P-value does not provide information about the evidence in favor of an alternative hypothesis (H₁): a small P-value may suggest that the data do not support the null hypothesis (H₀), but it does not quantify, through a comparative approach, how much more likely the data are under the alternative hypothesis [5]. The excessive use of the P-value encourages researchers to explore alternative approaches, such as the Bayes Factor (BF) [6]. The BF is a Bayesian tool used to compare the evidence in favor of two hypotheses by comparing the likelihood of the data under the null hypothesis with the likelihood of the data under the alternative hypothesis. Therefore, unlike the p-value, the BF directly measures the probability of the data under each hypothesis, providing a quantitative comparison between H₀ and H₁ [7]. Among the advantages of the BF is its ability to provide a continuous measure of evidence, comparing the alternative hypothesis with the null hypothesis while also allowing the incorporation of prior information into the analyses. Its value can be interpreted using specific scales [8]. Objectives The objective of this work is to compare the P-value and the BF as statistical tools for hypothesis testing, in order to highlight their behaviors in different scenarios involving (i) sample size and (ii) effect size. Methods A simulation study was conducted with various scenarios constructed by combining sample size and effect size. The proposed simulation uses a t-test on the difference between the means of two independent groups as the endpoint. Nine distinct scenarios were generated, which include: (i) three levels of effect size, defined as the standardized difference between the means of the two groups, equal to 0.1, 0.2, and 0.5; and (ii) three different sample sizes, equal to 50, 100, and 150. A total of 5000 replications were performed, and the results are expressed in terms of medians of the p-value and BF [9]. The Bayesian results were obtained using the R package "Bayes Factor." The default prior was applied, which is identified as a Cauchy distribution centered on 0 and is moderately informative. In the simulation, the default prior of the package was chosen for illustrative purposes, but the process of selecting a prior is not trivial and requires specific considerations related to the research context. Results The results of the study show that the Bayes Factor (BF) is less sensitive to sample size compared to the P-value when effect sizes are small (0.1 and 0.2). It can also be observed that the P-value becomes statistically significant for sample sizes of 100 and 150 units with an effect size of 0.5, and its significance increases at a very high rate, compared to the BF where the evidence in favor of H₁ remains moderate. In other words, the P-value becomes extremely low in the presence of an effect size of 0.5 for a sample size of 150 units, whereas the BF remains more cautious, indicating only moderate evidence in favor of the alternative hypothesis. Conclusions The results reveal that the P-value is more sensitive to changes in sample size and effect size compared to the BF. Additionally, the BF provides a more nuanced approach to decision-making, addressing the binary nature of the P-value in rejecting the null hypothesis. The Bayesian alternative can be advantageous for researchers in the healthcare context, as it allows for the incorporation of informative priors that could enhance analysis results and reduce the likelihood of assessment errors. However, a significant challenge of using the BF lies in the choice of the prior distribution, which can significantly impact the final results of the analyses.

  • Research Article
  • Cite Count Icon 7
  • 10.1016/s0378-3758(95)00198-0
Convergence of posterior odds
  • Nov 1, 1996
  • Journal of Statistical Planning and Inference
  • Richard A Levine + 1 more

Convergence of posterior odds

  • Front Matter
  • Cite Count Icon 11
  • 10.1016/j.ajodo.2021.10.008
The self-fulfilling prophecy of post-hoc power calculations
  • Jan 28, 2022
  • American Journal of Orthodontics and Dentofacial Orthopedics
  • Christos Christogiannis + 3 more

The self-fulfilling prophecy of post-hoc power calculations

  • Research Article
  • 10.21926/obm.icm.2402023
The Acceptability and Impact of a Video on Compassion Focused Therapy as a Psycho-Educative Tool to Deepen Awareness around Voice-Hearing
  • Apr 25, 2024
  • OBM Integrative and Complementary Medicine
  • Tara Hickey + 3 more

In the compassion focused therapy (CFT) model of voice-hearing, a distressing relationship with voices is thought to be influenced by evolved threat-protection patterns, which are activated and attuned by socially threatening experiences, such as being harmed by others, as well being shamed, stigmatised, invalidated, and excluded. Therefore, the CFT approach is particularly interested in the role of voice-hearers’ relationships with others and self, as well as their social context of family/friends, professionals/services, and the wider community. This article reports on the impact of a 5-minute film, ‘Compassion for Voices’, which aimed to engage a general public audience with the compassionate approach to relating with voices, with potential as a therapeutic, educational, and de-stigmatising tool. One hundred and thirty-seven people responded to an anonymous online public feedback survey asking about their perceived impact of this film, amongst whom were 20 voice-hearers, 30 family/friends of voice-hearers, and 87 who work with voice-hearers. Quantitative data were gathered from responders’ perceived impact ratings (yes/no) in several different domains, and qualitative feedback data were analysed using content analysis by an independent research team. Over 98% of total responders thought the film has, or could potentially have, an impact on people’s health and welfare, and within the subsamples of both family/friends and the people who directly work with voice-hearers, this was 100%. The qualitative data revealed main impact themes around &lt;em&gt;knowledge and education&lt;/em&gt;, &lt;em&gt;changes of attitudes or approaches to voice-hearing&lt;/em&gt;, and &lt;em&gt;validation of people’s lived experience&lt;/em&gt;. Although there are limitations to the online survey method, and therefore caution around what conclusions can be drawn, this study demonstrated a clear value and perceived impact among the sample who responded. This offers support for the use of video tools for social and community interventions, which is very much in keeping with the theoretically- and empirically- supported aims of CFT.

  • Research Article
  • Cite Count Icon 34
  • 10.1016/s0025-7125(03)00071-3
Homeopathy
  • Jan 1, 2002
  • Medical Clinics of North America
  • Woodson C Merrell + 1 more

Homeopathy

  • Book Chapter
  • Cite Count Icon 9
  • 10.1093/oso/9780199214655.003.0016
Approximating Interval Hypothesis: p-values and Bayes Factors
  • Jul 19, 2007
  • Judith Rousseau

In this paper we study Bayes factors and special p-values in their capacity of approximating interval null hypotheses (or more generally neighbourhoods of point null) hypotheses by point null or eventually parametric families in the case of the goodness of fit test of a parametric family. We prove that when the number of observations is large Bayes factors for point null hypotheses can approximate Bayes factors for interval null hypotheses for extremely small intervals. We also interpret the significance of calibration goodness of fit tests using a p-value in terms of width of neighbourhoods of the point null hypothesis. Finally we study the consistency of Bayes factors for goodness of fit tests of parametric families, which enables us to shed light on the behaviour of the Bayes factors.

  • Discussion
  • Cite Count Icon 2
  • 10.1016/s0140-6736(00)05055-8
Research funds for complementary medicine
  • Jun 1, 2001
  • The Lancet
  • Han Y Chen + 2 more

Research funds for complementary medicine

  • Front Matter
  • Cite Count Icon 5
  • 10.1016/j.jtcvs.2020.03.156
Guidelines for improving the use and presentation of P values
  • Apr 30, 2020
  • The Journal of Thoracic and Cardiovascular Surgery
  • Steven J Staffa + 1 more

Guidelines for improving the use and presentation of P values

  • Front Matter
  • Cite Count Icon 42
  • 10.1111/jan.14283
The earnestness of being important: Reporting non-significant statistical results.
  • Dec 13, 2019
  • Journal of Advanced Nursing
  • Denis C Visentin + 2 more

The earnestness of being important: Reporting non-significant statistical results.

  • Research Article
  • Cite Count Icon 10
  • 10.1044/2019_jslhr-h-19-0182
An Analysis of Nonsignificant Results in Audiology Using Bayes Factors.
  • Dec 5, 2019
  • Journal of Speech, Language, and Hearing Research
  • Christopher R Brydges + 1 more

Purpose Null hypothesis significance testing is commonly used in audiology research to determine the presence of an effect. Knowledge of study outcomes, including nonsignificant findings, is important for evidence-based practice. Nonsignificant p values obtained from null hypothesis significance testing cannot differentiate between true null effects or underpowered studies. Bayes factors (BFs) are a statistical technique that can distinguish between conclusive and inconclusive nonsignificant results, and quantify the strength of evidence in favor of 1 hypothesis over another. This study aimed to investigate the prevalence of BFs in nonsignificant results in audiology research and the strength of evidence in favor of the null hypothesis in these results. Method Nonsignificant results mentioned in abstracts of articles published in 2018 volumes of 4 prominent audiology journals were extracted (N = 108) and categorized based on whether BFs were calculated. BFs were calculated from nonsignificant t tests within this sample to determine how frequently the null hypothesis was strongly supported. Results Nonsignificant results were not directly tested with BFs in any study. Bayesian re-analysis of 93 nonsignificant t tests found that only 40.86% of findings provided moderate evidence in favor of the null hypothesis, and none provided strong evidence. Conclusion BFs are underutilized in audiology research, and a large proportion of null findings were deemed inconclusive when re-analyzed with BFs. Researchers are encouraged to use BFs to test the validity and strength of evidence of nonsignificant results and ensure that sufficient sample sizes are used so that conclusive findings (significant or not) are observed more frequently. Supplemental Material https://osf.io/b4kc7/.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.