Data governance and Gensini score automatic calculation for coronary angiography with deep-learning-based natural language extraction.

Feng Li,Yi Chen,Hongzeng Xu,Wei Nie,Feng Chen,Li Wang,Mingfeng Jiang

doi:10.3934/mbe.2024180

Abstract

With the widespread adoption of electronic health records, the amount of stored medical data has been increasing. Clinical data, often in the form of semi-structured or unstructured electronic medical records (EMRs), contains rich patient information. However, due to the use of natural language by physicians when composing these records, the effectiveness of traditional methods such as dictionaries, rule matching, and machine learning in the extraction of information from these unstructured texts falls short of clinical standards. In this paper, a novel deep-learning-based natural language extraction method is proposed to overcome current shortcomings in data governance and Gensini score automatic calculation in coronary angiography. A pre-trained model called bidirectional encoder representation from transformers (BERT) with strong text feature representation capabilities is employed as the feature representation layer. It is combined with bidirectional long short-term memory (BiLSTM) and conditional random field (CRF) models to extract both global and local features from the text. The study included an evaluation of the model on a dataset from a hospital in China and it was compared with another model to validate its practical advantages. Hence, the BiLSTM-CRF model was employed to automatically extract relevant coronary angiogram information from EMR texts. The achieved F1 score was 91.19, which is approximately 0.87 higher than the BERT-BiLSTM-CRF model.

Full Text