Abstract
BackgroundLiver cancer is a substantial disease burden in China. As one of the primary diagnostic tools for detecting liver cancer, dynamic contrast-enhanced computed tomography provides detailed evidences for diagnosis that are recorded in free-text radiology reports.ObjectiveThe aim of our study was to apply a deep learning model and rule-based natural language processing (NLP) method to identify evidences for liver cancer diagnosis automatically.MethodsWe proposed a pretrained, fine-tuned BERT (Bidirectional Encoder Representations from Transformers)-based BiLSTM-CRF (Bidirectional Long Short-Term Memory-Conditional Random Field) model to recognize the phrases of APHE (hyperintense enhancement in the arterial phase) and PDPH (hypointense in the portal and delayed phases). To identify more essential diagnostic evidences, we used the traditional rule-based NLP methods for the extraction of radiological features. APHE, PDPH, and other extracted radiological features were used to design a computer-aided liver cancer diagnosis framework by random forest.ResultsThe BERT-BiLSTM-CRF predicted the phrases of APHE and PDPH with an F1 score of 98.40% and 90.67%, respectively. The prediction model using combined features had a higher performance (F1 score, 88.55%) than those using APHE and PDPH (84.88%) or other extracted radiological features (83.52%). APHE and PDPH were the top 2 essential features for liver cancer diagnosis.ConclusionsThis work was a comprehensive NLP study, wherein we identified evidences for the diagnosis of liver cancer from Chinese radiology reports, considering both clinical knowledge and radiology findings. The BERT-based deep learning method for the extraction of diagnostic evidence achieved state-of-the-art performance. The high performance proves the feasibility of the BERT-BiLSTM-CRF model in information extraction from Chinese radiology reports. The findings of our study suggest that the deep learning–based method for automatically identifying evidences for diagnosis can be extended to other types of Chinese clinical texts.
Highlights
In the past decades, electronic health records (EHRs) from millions of patients have become massive sources of valuable clinical data
The high performance proves the feasibility of the BERT-bidirectional long short-term memory BIO (BiLSTM)-conditional random field Chinese Society of Clinical Oncology (CSCO) (CRF) model in information extraction from Chinese radiology reports
We extracted the features of arterial phase (APHE) and portal and delayed phases (PDPH) by using 3 different models, that is, CRF, BiLSTM-CRF, and BERT-BiLSTM-CRF
Summary
Electronic health records (EHRs) from millions of patients have become massive sources of valuable clinical data. Machine learning–based algorithms, especially deep learning algorithms, have been applied effectively to analyze patient data and they have shown promising results, thereby advancing medical research and better informing clinical decision making for the secondary use of EHRs [1,2]. Natural language processing (NLP) technologies could extract meaningful information, facilitating the application of clinical texts. The use of machine learning methods for data mining of EHRs can derive previously unknown clinical insights and be applied powerfully in clinical decision-making and computer-aided diagnosis of diseases [3,4]. Deep learning methods have made improvements in various clinical applications, especially for text classification, named-entity recognition (NER), relation extraction, and question answering [7,8]. As one of the primary diagnostic tools for detecting liver cancer, dynamic contrast-enhanced computed tomography provides detailed evidences for diagnosis that are recorded in free-text radiology reports
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.