On the use of receiver operating characteristic curve analysis to determine the most appropriate p value significance threshold

Farrokh Habibzadeh

doi:10.1186/s12967-023-04827-8

Abstract

Backgroundp value is the most common statistic reported in scientific research articles. Choosing the conventional threshold of 0.05 commonly used for the p value in research articles, is unfounded. Many researchers have tried to provide a reasonable threshold for the p value; some proposed a lower threshold, eg, 0.005. However, none of the proposals has gained universal acceptance. Using the analogy between the diagnostic tests with continuous results and statistical inference tests of hypothesis, I wish to present a method to calculate the most appropriate p value significance threshold using the receiver operating characteristic curve (ROC) analysis.ResultsAs with diagnostic tests where the most appropriate cut-off values are different depending on the situation, there is no unique cut-off for the p significance threshold. Unlike the previous proposals, which mostly suggest lowering the threshold to a fixed value (eg, from 0.05 to 0.005), the most appropriate p significance threshold proposed here, in most instances, is much less than the conventional cut-off of 0.05 and varies from study to study and from statistical test to test, even within a single study. The proposed method provides the minimum weighted sum of type I and type II errors.ConclusionsGiven the perplexity involved in using the frequentist statistics in a correct way (dealing with different p significance thresholds, even in a single study), it seems that the p value is no longer a proper statistic to be used in our research; it should be replaced by alternative methods, eg, Bayesian methods.

Full Text