Abstract
BackgroundSmartphones have made it possible for patients to digitally report symptoms before physical primary care visits. Using machine learning (ML), these data offer an opportunity to support decisions about the appropriate level of care (triage).ObjectiveThe purpose of this study was to explore the interrater reliability between human physicians and an automated ML-based triage method.MethodsAfter testing several models, a naïve Bayes triage model was created using data from digital medical histories, capable of classifying digital medical history reports as either in need of urgent physical examination or not in need of urgent physical examination. The model was tested on 300 digital medical history reports and classification was compared with the majority vote of an expert panel of 5 primary care physicians (PCPs). Reliability between raters was measured using both Cohen κ (adjusted for chance agreement) and percentage agreement (not adjusted for chance agreement).ResultsInterrater reliability as measured by Cohen κ was 0.17 when comparing the majority vote of the reference group with the model. Agreement was 74% (138/186) for cases judged not in need of urgent physical examination and 42% (38/90) for cases judged to be in need of urgent physical examination. No specific features linked to the model’s triage decision could be identified. Between physicians within the panel, Cohen κ was 0.2. Intrarater reliability when 1 physician retriaged 50 reports resulted in Cohen κ of 0.55.ConclusionsLow interrater and intrarater agreement in triage decisions among PCPs limits the possibility to use human decisions as a reference for ML to automate triage in primary care.
Highlights
Health care digitalization has the potential to mitigate increasing primary care workloads [1,2]
To reduce primary care physicians (PCPs) workload and to ensure patients are directed to the appropriate level of care, nurse-led telephone triage is commonly used [6,7]
We evaluated the majority vote of the dichotomized responses of individual classifiers and employed a cross-validation scheme to estimate generalization properties
Summary
Health care digitalization has the potential to mitigate increasing primary care workloads [1,2]. Time-constrained primary care physicians (PCPs) interrupt patient queries within the first 30 seconds of consultations [3], contributing to inadequate gathering of medical histories [4,5]. To reduce PCP workload and to ensure patients are directed to the appropriate level of care, nurse-led telephone triage is commonly used [6,7]. The model was tested on 300 digital medical history reports and classification was compared with the majority vote of an expert panel of 5 primary care physicians (PCPs). Results: Interrater reliability as measured by Cohen κ was 0.17 when comparing the majority vote of the reference group with the model. Conclusions: Low interrater and intrarater agreement in triage decisions among PCPs limits the possibility to use human decisions as a reference for ML to automate triage in primary care
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.