Bidirectional Encoder Representations Research Articles

The integration of deep learning into radiology has the potential to enhance diagnostic processes, yet its acceptance in clinical practice remains limited due to various challenges. This study aimed to develop and evaluate a fine-tuned large language model (LLM), based on Llama 3-8B, to automate the generation of accurate and concise conclusions in magnetic resonance imaging (MRI) and computed tomography (CT) radiology reports, thereby assisting radiologists and improving reporting efficiency. A dataset comprising 15,000 radiology reports was collected from the University of Medicine and Pharmacy of Craiova's Imaging Center, covering a diverse range of MRI and CT examinations made by four experienced radiologists. The Llama 3-8B model was fine-tuned using transfer-learning techniques, incorporating parameter quantization to 4-bit precision and low-rank adaptation (LoRA) with a rank of 16 to optimize computational efficiency on consumer-grade GPUs. The model was trained over five epochs using an NVIDIA RTX 3090 GPU, with intermediary checkpoints saved for monitoring. Performance was evaluated quantitatively using Bidirectional Encoder Representations from Transformers Score (BERTScore), Recall-Oriented Understudy for Gisting Evaluation (ROUGE), Bilingual Evaluation Understudy (BLEU), and Metric for Evaluation of Translation with Explicit Ordering (METEOR) metrics on a held-out test set. Additionally, a qualitative assessment was conducted, involving 13 independent radiologists who participated in a Turing-like test and provided ratings for the AI-generated conclusions. The fine-tuned model demonstrated strong quantitative performance, achieving a BERTScore F1 of 0.8054, a ROUGE-1 F1 of 0.4998, a ROUGE-L F1 of 0.4628, and a METEOR score of 0.4282. In the human evaluation, the artificial intelligence (AI)-generated conclusions were preferred over human-written ones in approximately 21.8% of cases, indicating that the model's outputs were competitive with those of experienced radiologists. The average rating of the AI-generated conclusions was 3.65 out of 5, reflecting a generally favorable assessment. Notably, the model maintained its consistency across various types of reports and demonstrated the ability to generalize to unseen data. The fine-tuned Llama 3-8B model effectively generates accurate and coherent conclusions for MRI and CT radiology reports. By automating the conclusion-writing process, this approach can assist radiologists in reducing their workload and enhancing report consistency, potentially addressing some barriers to the adoption of deep learning in clinical practice. The positive evaluations from independent radiologists underscore the model's potential utility. While the model demonstrated strong performance, limitations such as dataset bias, limited sample diversity, a lack of clinical judgment, and the need for large computational resources require further refinement and real-world validation. Future work should explore the integration of such models into clinical workflows, address ethical and legal considerations, and extend this approach to generate complete radiology reports.

Read full abstract

Named entity recognition (NER) models are essential for extracting structured information from unstructured medical texts by identifying entities such as diseases, treatments, and conditions, enhancing clinical decision-making and research. Innovations in machine learning, particularly those involving Bidirectional Encoder Representations From Transformers (BERT)-based deep learning and large language models, have significantly advanced NER capabilities. However, their performance varies across medical datasets due to the complexity and diversity of medical terminology. Previous studies have often focused on overall performance, neglecting specific challenges in medical contexts and the impact of macrofactors like lexical composition on prediction accuracy. These gaps hinder the development of optimized NER models for medical applications. This study aims to meticulously evaluate the performance of various NER models in the context of medical text analysis, focusing on how complex medical terminology affects entity recognition accuracy. Additionally, we explored the influence of macrofactors on model performance, seeking to provide insights for refining NER models and enhancing their reliability for medical applications. This study comprehensively evaluated 7 NER models-hidden Markov models, conditional random fields, BERT for Biomedical Text Mining, Big Transformer Models for Efficient Long-Sequence Attention, Decoding-enhanced BERT with Disentangled Attention, Robustly Optimized BERT Pretraining Approach, and Gemma-across 3 medical datasets: Revised Joint Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA), BioCreative V CDR, and Anatomical Entity Mention (AnatEM). The evaluation focused on prediction accuracy, resource use (eg, central processing unit and graphics processing unit use), and the impact of fine-tuning hyperparameters. The macrofactors affecting model performance were also screened using the multilevel factor elimination algorithm. The fine-tuned BERT for Biomedical Text Mining, with balanced resource use, generally achieved the highest prediction accuracy across the Revised JNLPBA and AnatEM datasets, with microaverage (AVG_MICRO) scores of 0.932 and 0.8494, respectively, highlighting its superior proficiency in identifying medical entities. Gemma, fine-tuned using the low-rank adaptation technique, achieved the highest accuracy on the BioCreative V CDR dataset with an AVG_MICRO score of 0.9962 but exhibited variability across the other datasets (AVG_MICRO scores of 0.9088 on the Revised JNLPBA and 0.8029 on AnatEM), indicating a need for further optimization. In addition, our analysis revealed that 2 macrofactors, entity phrase length and the number of entity words in each entity phrase, significantly influenced model performance. This study highlights the essential role of NER models in medical informatics, emphasizing the imperative for model optimization via precise data targeting and fine-tuning. The insights from this study will notably improve clinical decision-making and facilitate the creation of more sophisticated and effective medical NER models.

Read full abstract

Bidirectional Encoder Representations Research Articles

Related Topics

Articles published on Bidirectional Encoder Representations

Fine-Tuned Bidirectional Encoder Representations From Transformers Versus ChatGPT for Text-Based Outpatient Department Recommendation: Comparative Study.

Can the sentiment of the official media predict the return volatility of the Chinese crude oil futures?

GPT-Driven Radiology Report Generation with Fine-Tuned Llama 3.

Design of agricultural question answering information extraction method based on improved BILSTM algorithm

Sequential lexicon enhanced bidirectional encoder representations from transformers: Chinese named entity recognition using sequential lexicon enhanced BERT

Evaluating Medical Entity Recognition in Health Care: Entity Model Quantitative Study.

Zero-Shot Learning for Accurate Project Duration Prediction in Crowdsourcing Software Development

Domain knowledge-powered attention for air traffic management hazardous events classification

Investigating the agenda of global warming on Twitter: A machine learning approach

Applications of BERT in sentimental analysis

Evaluating the Performance of Five Classifiers for Twitter Sentiment Analysis Using Bag of Words

Joint intent detection and slot filling with syntactic and semantic features using multichannel CNN-BiLSTM

Revealing the impact of social circumstances on the selection of cancer therapy through natural language processing of social work notes.

Use of SNOMED CT in Large Language Models: Scoping Review.

Public Sentiment Analysis of the Israel-Palestine Conflict on Social Media Using BERT

Hybrid Lexicon and Transformer-Based Sentiment Analysis of Student Feedback for Faculty Evaluation: A Speech-to-Text Approach

CADefender: Detection of unknown malicious AutoLISP computer-aided design files using designated feature extraction and machine learning methods

Text mining approach for feature extraction and cartilage disease grade classification using knee MRI radiology reports

Technological Advancements in Menstrual Health: The Role of Generative Pre-Trained Transformer and Bees Algorithm

A COMPREHENSIVE STUDY OF MACHINE LEARNING APPROACHES FOR CUSTOMER SENTIMENT ANALYSIS IN BANKING SECTOR

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Bidirectional Encoder Representations Research Articles

Related Topics

Articles published on Bidirectional Encoder Representations

Fine-Tuned Bidirectional Encoder Representations From Transformers Versus ChatGPT for Text-Based Outpatient Department Recommendation: Comparative Study.

Can the sentiment of the official media predict the return volatility of the Chinese crude oil futures?

GPT-Driven Radiology Report Generation with Fine-Tuned Llama 3.

Design of agricultural question answering information extraction method based on improved BILSTM algorithm

Sequential lexicon enhanced bidirectional encoder representations from transformers: Chinese named entity recognition using sequential lexicon enhanced BERT

Evaluating Medical Entity Recognition in Health Care: Entity Model Quantitative Study.

Zero-Shot Learning for Accurate Project Duration Prediction in Crowdsourcing Software Development

Domain knowledge-powered attention for air traffic management hazardous events classification

Investigating the agenda of global warming on Twitter: A machine learning approach

Applications of BERT in sentimental analysis

Evaluating the Performance of Five Classifiers for Twitter Sentiment Analysis Using Bag of Words

Joint intent detection and slot filling with syntactic and semantic features using multichannel CNN-BiLSTM

Revealing the impact of social circumstances on the selection of cancer therapy through natural language processing of social work notes.

Use of SNOMED CT in Large Language Models: Scoping Review.

Public Sentiment Analysis of the Israel-Palestine Conflict on Social Media Using BERT

Hybrid Lexicon and Transformer-Based Sentiment Analysis of Student Feedback for Faculty Evaluation: A Speech-to-Text Approach

CADefender: Detection of unknown malicious AutoLISP computer-aided design files using designated feature extraction and machine learning methods

Text mining approach for feature extraction and cartilage disease grade classification using knee MRI radiology reports

Technological Advancements in Menstrual Health: The Role of Generative Pre-Trained Transformer and Bees Algorithm

A COMPREHENSIVE STUDY OF MACHINE LEARNING APPROACHES FOR CUSTOMER SENTIMENT ANALYSIS IN BANKING SECTOR