Abstract
In this study, we aimed to evaluate interrater agreement statistics (IRAS) for use in research on low base rate clinical diagnoses or observed behaviors. Establishing and reporting sufficient interrater agreement is essential in such studies. Yet the most commonly applied agreement statistic, Cohen's κ, has a well known sensitivity to base rates that results in a substantial penalization of interrater agreement when behaviors or diagnoses are very uncommon, a prevalent and frustrating concern in such studies. We performed Monte Carlo simulations to evaluate the performance of 5 of κ's alternatives (Van Eerdewegh's V, Yule's Y, Holley and Guilford's G, Scott's π, and Gwet's AC₁), alongside κ itself. The simulations investigated the robustness of these IRAS to conditions that are common in clinical research, with varying levels of behavior or diagnosis base rate, rater bias, observed interrater agreement, and sample size. When the base rate was 0.5, each IRAS provided similar estimates, particularly with unbiased raters. G was the least sensitive of the IRAS to base rates. The results encourage the use of the G statistic for its consistent performance across the simulation conditions. We recommend separately reporting the rates of agreement on the presence and absence of a behavior or diagnosis alongside G as an index of chance corrected overall agreement.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.