Abstract

Diagnosis codes are used as a billing mechanism in the Electronic Health Record and have the capability to benefit decision support systems, which aim to assist coders by suggesting a relevant subset of potential codes to choose from. Due to the large set of possible labels and length of patient records, automatic ICD code assignment is considered to be a challenging task within the field of multi-label classification. This paper introduces a baseline for automatic ICD code assignment using Support Vector Machines (SVM) and FastText with Unified Medical Language System (UMLS) metathesaurus mappings into word embedding models. Training data is obtained from the Medical Information Mart for Intensive Care (MIMIC-III) database and extended with 'is-a' relationships from ICD-9 hierarchy. FastText is evaluated with different label count estimations, of which an approach based on label cardinality yields a F1-Score of 62.2%. FastText achieves high recall results and mentionable performance improvements over previous models. Reported values are obtained through 10-fold cross-validation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call