Claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods.

Konan Hara,Ryo Ikesu,Ung-Il Chung,Thomas Svensson,Akiko Kishi Svensson,Yasuki Kobayashi,Yuki Ito,Jun Tomio

doi:10.1371/journal.pone.0254394

Abstract

Identification of medical conditions using claims data is generally conducted with algorithms based on subject-matter knowledge. However, these claims-based algorithms (CBAs) are highly dependent on the knowledge level and not necessarily optimized for target conditions. We investigated whether machine learning methods can supplement researchers’ knowledge of target conditions in building CBAs. Retrospective cohort study using a claims database combined with annual health check-up results of employees’ health insurance programs for fiscal year 2016–17 in Japan (study population for hypertension, N = 631,289; diabetes, N = 152,368; dyslipidemia, N = 614,434). We constructed CBAs with logistic regression, k-nearest neighbor, support vector machine, penalized logistic regression, tree-based model, and neural network for identifying patients with three common chronic conditions: hypertension, diabetes, and dyslipidemia. We then compared their association measures using a completely hold-out test set (25% of the study population). Among the test cohorts of 157,822, 38,092, and 153,608 enrollees for hypertension, diabetes, and dyslipidemia, 25.4%, 8.4%, and 38.7% of them had a diagnosis of the corresponding condition. The areas under the receiver operating characteristic curve (AUCs) of the logistic regression with/without subject-matter knowledge about the target condition were .923/.921 for hypertension, .957/.938 for diabetes, and .739/.747 for dyslipidemia. The logistic lasso, logistic elastic-net, and tree-based methods yielded AUCs comparable to those of the logistic regression with subject-matter knowledge: .923-.931 for hypertension; .958-.966 for diabetes; .747-.773 for dyslipidemia. We found that machine learning methods can attain AUCs comparable to the conventional knowledge-based method in building CBAs.

Highlights

A growing body of studies using medical and pharmacy claims data has been conducted in various fields of health research [1,2,3,4,5,6,7]
90% of the ICD-10 codes that appeared in the dataset were only observed for less than 1% of enrollees, and more than half of the World Health Organization-Anatomical Therapeutic Chemical (WHO-ATC) codes that appeared in the dataset were observed for less than 5% of enrollees
Using health check-up results as the source of the gold standard, we demonstrated the association measures of the claims-based algorithms (CBAs) derived from machine learning methods without a condition-specific variable selection for identifying patients with three common chronic medical conditions, hypertension, diabetes, and dyslipidemia

Summary

Introduction

A growing body of studies using medical and pharmacy claims data has been conducted in various fields of health research [1,2,3,4,5,6,7]. Despite its large volume of information and highly standardized format, claims data is frequently criticized for its potential imprecision in the identification of medical conditions mainly because they are primarily issued for reimbursement purpose [8,9,10,11,12] To address these concerns, plenty of studies have proposed a claims-based algorithm (CBA) for identifying patients with their target condition and computed association measures to assess the usability of the algorithm [9, 10, 13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41]. This is imposing challenges to the use of administrative data in the transition from the ICD-9 to the ICD-10 coding scheme in the United States [42, 43]

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one

Lead the way for us

Journal: PloS one	Publication Date: Sep 27, 2021
License type: CC BY 4.0

Similar Papers

Deposit type discrimination based on trace elements in sphalerite
Yu-Miao Meng ... Songning Meng
Ore Geology Reviews | VOL. 165
Yu-Miao Meng, et. al.Yu-Miao Meng ... Songning Meng
13 Jan 2024
Ore Geology Reviews | VOL. 165

Machine Learning Methods Based on CT Features Differentiate G1/G2 From G3 Pancreatic Neuroendocrine Tumors
Hai-Yan Chen ... Guo-Liang Shao
Academic radiology | VOL. 31
Hai-Yan Chen, et. al.Hai-Yan Chen ... Guo-Liang Shao
04 Dec 2023
Academic radiology | VOL. 31

Deep Learning and Machine Learning with Grid Search to Predict Later Occurrence of Breast Cancer Metastasis Using Clinical Data.
Xia Jiang ... Chuhan Xu
Journal of Clinical Medicine | VOL. 11
Xia Jiang, et. al.Xia Jiang ... Chuhan Xu
29 Sep 2022
Journal of Clinical Medicine | VOL. 11

A Comparative Analysis of Machine Learning Models for Prediction of Chronic Kidney Disease
Nariman Khalil ... Mohamed Eassa
Sustainable Machine Intelligence Journal | VOL. 5
Nariman Khalil, et. al.Nariman Khalil ... Mohamed Eassa
25 Oct 2023
Sustainable Machine Intelligence Journal | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Claims-based algorithms for common chronic conditions were efficiently constructed using machine learning methods.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PloS one