Abstract

Given the clinical notes written in electronic health records (EHRs), it is challenging to predict the diagnostic codes which is formulated as a multi-label classification task. The large set of labels, the hierarchical dependency, and the imbalanced data make this prediction task extremely hard. Most existing work built a binary prediction for each label independently, ignoring the dependencies between labels. To address this problem, we propose a two-stage framework to improve automatic ICD coding by capturing the label correlation. Specifically, we train a label set distribution estimator to rescore the probability of each label set candidate generated by a base predictor. This paper is the first attempt at learning the label set distribution as a reranking module for ICD coding. In the experiments, our proposed framework is able to improve upon best-performing predictors for medical code prediction on the benchmark MIMIC datasets.

Highlights

  • Clinical notes from electronic health records (EHRs) are free-from text generated by clinicians during patient visits

  • Cheng et al (2010) generalized CC to probabilistic classifier chains (PCC), where the proposed approach estimates the joint probability of labels and provides a proper interpretation of CC

  • The results on the MIMIC-3 and MIMIC-2 datasets are shown in Table 1 and Table 2 respecitively

Read more

Summary

Introduction

Clinical notes from electronic health records (EHRs) are free-from text generated by clinicians during patient visits. Prior work on neural models mostly treated the task of automatic ICD coding as a multi-label classification problem. These models mostly employ a shared text encoder, and build one binary classifier for each label on top of the encoder. Prior work considered the hierarchical dependencies between ICD codes by using hierarchical SVM (Perotte et al, 2013) or by introducing new loss terms to leverage the ICD structure (Tsai et al, 2019). They borrowed the dependency from domain experts and did not consider the label correlation in the data

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call