Abstract

BackgroundLaboratory indicator test results in electronic health records have been applied to many clinical big data analysis. However, it is quite common that the same laboratory examination item (i.e., lab indicator) is presented using different names in Chinese due to the translation problem and the habit problem of various hospitals, which results in distortion of analysis results.MethodsA framework with a recall model and a binary classification model is proposed, which could reduce the alignment scale and improve the accuracy of lab indicator normalization. To reduce alignment scale, tf-idf is used for candidate selection. To assure the accuracy of output, we utilize enhanced sequential inference model for binary classification. And active learning is applied with a selection strategy which is proposed for reducing annotation cost.ResultsSince our indicator standardization method mainly focuses on Chinese indicator inconsistency, we perform our experiment on Shanghai Hospital Development Center and select clinical data from 8 hospitals. The method achieves a F1-score 92.08% in our final binary classification. As for active learning, the new strategy proposed performs better than random baseline and could outperform the result trained on full data with only 43% training data. A case study on heart failure clinic analysis conducted on the sub-dataset collected from SHDC shows that our proposed method is practical in the application with good performance.ConclusionThis work demonstrates that the structure we proposed can be effectively applied to lab indicator normalization. And active learning is also suitable for this task for cost reduction. Such a method is also valuable in data cleaning, data mining, text extracting and entity alignment.

Highlights

  • Laboratory indicator test results in electronic health records have been applied to many clinical big data analysis

  • Conclusions and future work In this paper, we propose an effective recall-and-classification structure based on active learning to standardize the lab indicators in Shanghai Hospital Development Center (SHDC)

  • To decline the alignment scale, we test several classic text matching methods and utilize tf-idf to recall a candidate set for non-standard indicators

Read more

Summary

Introduction

Laboratory indicator test results in electronic health records have been applied to many clinical big data analysis. It is quite common that the same laboratory examination item (i.e., lab indicator) is presented using different names in Chinese due to the translation problem and the habit problem of various hospitals, which results in distortion of analysis results. Electronic health records (EHRs) have been applied to many clinical data analysis, such as prognostic analysis and decision support. In EHRs, laboratory indicator test results are considered to be important factors. ” (Aspartate aminotransferase, AST) can be regarded as a diagnostic. Many same indicators are presented using different names in Chinese. There may be two main reasons: The first one is a translation problem. Aspartate aminotransferase can be translated as “

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call