A hierarchical method to automatically encode Chinese diagnoses through semantic similarity estimation.

Wenxin Ning,Runtong Zhang,Ming Yu

doi:10.1186/s12911-016-0269-4

Wenxin Ning, Runtong Zhang + Show 1 more

Open Access

https://doi.org/10.1186/s12911-016-0269-4

Copy DOI

Abstract

BackgroundThe accumulation of medical documents in China has rapidly increased in the past years. We focus on developing a method that automatically performs ICD-10 code assignment to Chinese diagnoses from the electronic medical records to support the medical coding process in Chinese hospitals.MethodsWe propose two encoding methods: one that directly determines the desired code (flat method), and one that hierarchically determines the most suitable code until the desired code is obtained (hierarchical method). Both methods are based on instances from the standard diagnostic library, a gold standard dataset in China. For the first time, semantic similarity estimation between Chinese words are applied in the biomedical domain with the successful implementation of knowledge-based and distributional approaches. Characteristics of the Chinese language are considered in implementing distributional semantics. We test our methods against 16,330 coding instances from our partner hospital.ResultsThe hierarchical method outperforms the flat method in terms of accuracy and time complexity. Representing distributional semantics using Chinese characters can achieve comparable performance to the use of Chinese words. The diagnoses in the test set can be encoded automatically with micro-averaged precision of 92.57 %, recall of 89.63 %, and F-score of 91.08 %. A sharp decrease in encoding performance is observed without semantic similarity estimation.ConclusionThe hierarchical nature of ICD-10 codes can enhance the performance of the automated code assignment. Semantic similarity estimation is demonstrated indispensable in dealing with Chinese medical text. The proposed method can greatly reduce the workload and improve the efficiency of the code assignment process in Chinese hospitals.Electronic supplementary materialThe online version of this article (doi:10.1186/s12911-016-0269-4) contains supplementary material, which is available to authorized users.

Highlights

The accumulation of medical documents in China has rapidly increased in the past years
The current study focuses on International Classification of Diseases (ICD)-10 code assignment to Chinese diagnoses because the coding process in China is primarily based on the diagnostic statements from the problem list in electronic medical record (EMR)
Example-based model Given the lack of mature tools and a complete knowledge base for Chinese medical language processing, automated code assignment to Chinese diagnoses cannot be fulfilled through the MLP approach

Summary

Introduction

The accumulation of medical documents in China has rapidly increased in the past years. We focus on developing a method that automatically performs ICD-10 code assignment to Chinese diagnoses from the electronic medical records to support the medical coding process in Chinese hospitals. The electronic medical record (EMR) is a rich source of clinical information for medical study and other applications related to quality of healthcare, clinical decision support, and reliable information flow among individuals and departments involved in patient care [1]. Ning et al BMC Medical Informatics and Decision Making (2016) 16:30 the category (3-digit) code A00 pertains to the condition “Cholera” and its subcategory (4-digit) code A00.0 pertains to the specific condition “Cholera due to Vibrio cholerae 01, biovar cholerae”. Code assignment is usually performed at the subcategory level, which contains over 10,000 unique ICD-10 codes

Methods

Results

Conclusion