Abstract

IntroductionThe Discharge Abstract Database (DAD) associates ICD-10-CA diagnosis codes with inpatient care episodes at acute-care facilities. Codes are assigned by human coders, based on chart review. Coding guidelines stipulate mandatory coding of major and fatal conditions but only optional coding of secondary conditions, which results in undercoding for many conditions.
 Objectives and ApproachThis research evaluates machine learning approaches for identifying and completing records with missing codes, to improve data quality. The Alberta Hospital DAD for 2013-14 was used in this study. We assumed that the existing ICD-10-CA codes in the DAD are correct, and used them as training examples. Several ML classifiers, including logistic regression and random forest, were used to develop models to assess the coding probability, using existing codes and demographic information. 3300 chart-review records were used as the reference standard. We focused on hypertension-related codes. Validity of raw diagnosis codes in the DAD was used as the baseline.
 ResultsA record is deemed to have a missing hypertension diagnosis code if the predicted probability is high, but without the diagnosis codes having been assigned by the coders. In the baseline, the original hypertension codes have high PPV (ranging from 0.902 for the age group 35-54 to 1.000 for the age group 18-34) but low sensitivity (ranging from 0.200 for the age group 18-34 to 0.565 for the age group 75+). The most successful models that we have tested so far have provided improvements of 2-6% in the sensitivity, while maintaining the PPV. More improvement is generally seen for the younger age groups. Initial experiments indicate greater improvements in sensitivity may be possible for other conditions, such as peptic ulcer disease and cerebrovascular disease.
 Conclusion/ImplicationsMachine learning approaches can be useful and cost-effective for improving data quality in DAD. While the improvements in sensitivity relative to the baseline are modest at present, further experiments with different models and feature sets are warranted. Experiments with other conditions may also be fruitful.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.