Abstract

Automatic International Classification of Diseases (ICD) coding is a method of automatically classifying diseases through a computer program based on rules of etiology and clinical presentation, and representing them through codes, which are widely used to assist in medical reimbursement and reporting of patient health status. With the application of machine learning and deep learning, the accuracy of automatic ICD coding methods has improved considerably. However, this has been accompanied by problems such as insufficient pre-training of text in the models and increased computational complexity along with improved prediction accuracy. In this work we propose an approach called TF-GCN to counter this problem. Firstly, a more accurate and concise feature representation is obtained by feature extraction of both clinical records and ICD codes through the transformer-based model. Secondly, the node features, document features, and relationships between them in the obtained clinical records are input to the GCN for training. Next, a pseudo labeling attention mechanism is added to eliminate the noise generated in the feature extraction process. Finally, the features of the clinical records are compared with the features of the ICD codes for similarity to obtain the classification results. This can not only reduce computational redundancy, but also obtain more accurate classification features. In the real-world MIMIC-III dataset, we compare the proposed algorithm with 11 automatic ICD coding methods to validate the performance of TF-GCN. According to experimental findings, our suggested strategy outperforms the standard evaluation metrics Mif (0.589), MiAUC (0.989), and P@8 (0.758).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call