Abstract
Interpreting randomized clinical trials (RCTs) and their clinical relevance is challenging when P values are either marginally above or below the P = .05 threshold. To use the concept of reverse fragility index (RFI) to provide a measure of confidence in the neutrality of RCT results when assessed from the clinical perspective. In this cross-sectional study, a MEDLINE search was conducted for RCTs published from January 1, 2013, to December 31, 2018, in JAMA, the New England Journal of Medicine (NEJM), and The Lancet. Eligible studies were phase 3 and 4 trials with 1:1 randomization and statistically nonsignificant binary primary end points. Data analysis was performed from August 1, 2019, to August 31, 2019. Single vs multicenter enrollment, total number of events, private vs government funding, placebo vs active control, and time to event vs frequency data. The primary outcome was the median RFI with interquartile range (IQR) at the P = .05 threshold. Secondary outcomes were the number of RCTs in which the number of participants lost to follow-up was greater than the RFI; the median RFI with IQR at different P value thresholds; the median reverse fragility quotient with IQR; and the correlation between sample sizes, number of events, and P values of the RCT and RFI. Of the 167 RCTs included, 76 (46%) were published in the NEJM, 50 (30%) in JAMA, and 41 (24%) in The Lancet. The median (IQR) sample size was 970 (470-3427) participants, and the median (IQR) number of events was 251 (105-570). The median (IQR) RFI at the P = .05 threshold was 8 (5-13). Fifty-seven RCTs (34%) had an RFI of 5 or lower, and in 68 RCTs (41%) the number of participants lost to follow-up was greater than the RFI. Trials with P values ranging from P = .06 to P = .10 had a median (IQR) RFI of 3 (2-4). When compared, median (IQR) RFIs were not statistically significant for single-center vs multicenter enrollment (5 [4-13] vs 8 [5-13]; P = .41), private vs government-funded studies (9 [5-13] vs 8 [5-13]; P = .34), and time-to-event primary end points vs frequency data (9 [5-14] vs 7 [4-13]; P = .43). The median (IQR) RFI at the P = .01 threshold was 12 (7-19) and at the P = .005 threshold was 14 (9-21). This cross-sectional study found that a relatively small number of events (median of 8) had to change to move the primary end point of an RCT from nonsignificant to statistically significant. These findings emphasize the nuance required when interpreting trial results that did not meet prespecified significance thresholds.
Highlights
Interpreting randomized clinical trial (RCT) results and their clinical relevance when P values are marginally above or below the threshold of P = .05 is challenging.1 the clinical relevance may not be different, a P value marginally below the P = .05 threshold is usually accepted as a favorable finding in a trial, and a P value above the P = .05 threshold is considered an unfavorable result.2 Efficacy of an intervention should be evaluated comprehensively on the basis of the effect size measures, such as relative risk reduction or number needed to treat accompanied by P values and 95% CIs, but clinical research continues to emphasize the prespecified threshold of P = .05 when interpreting results
Secondary outcomes were the number of RCTs in which the number of participants lost to follow-up was greater than the reverse fragility index (RFI); the median RFI with interquartile range (IQR) at different P value thresholds; the median reverse fragility quotient with IQR; and the correlation between sample sizes, number of events, and P values of the RCT and RFI
Fifty-seven RCTs (34%) had an RFI of 5 or lower, and in 68 RCTs (41%) the number of participants lost to follow-up was greater than the RFI
Summary
Efficacy of an intervention should be evaluated comprehensively on the basis of the effect size measures, such as relative risk reduction or number needed to treat accompanied by P values and 95% CIs, but clinical research continues to emphasize the prespecified threshold of P = .05 when interpreting results. Such reliance on P values invites the risk of a type II error (ie, nonrejection of a false null hypothesis, which is known as a false-negative or β error), especially in the presence of fewer events, small sample sizes, and/or limited follow-up times.. Such reliance on P values invites the risk of a type II error (ie, nonrejection of a false null hypothesis, which is known as a false-negative or β error), especially in the presence of fewer events, small sample sizes, and/or limited follow-up times. it is critical to evaluate the robustness of null trial results in cases in which the clinical consequences of a type II error are more important than those of a type I error (ie, rejection of a true null hypothesis, which is known as a false-positive or α error), such as in disease states with high mortality and limited therapeutic options and with an acceptable intervention safety profile.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.