Abstract
The utility of machine learning, specifically large language models (LLMs), in the medical field has gained considerable attention. However, there is a scarcity of studies that focus on the application of LLMs in generating custom subspecialty radiology impressions. The primary objective of this study is to evaluate and compare the performance of multiple LLMs in generating specialized, accurate, and clinically useful radiology impressions for degenerative cervical spine MRI reports. The study employed a comparative analysis of multiple LLMs, including OpenAI's ChatGPT-3.5 and GPT-4 (OpenAI, San Francisco, CA), Antrhopic's Claude 2 (Anthropic PBC, San Francisco, CA), Google's Bard (Google Inc., Mountain View, CA), and Meta's Llama 2 (Meta Platforms, Inc., Menlo Park, CA). This was performed during January-February 2024. These models were evaluated using a few-shot learning approach on a dataset consisting of 10 examples from 50 synthetically generated MRI reports. Performance metrics evaluated were diagnostic accuracy, stylistic accuracy, and redundancy. While Claude 2 maintained consistent high performance across 40 cases, GPT-4 required midway re-training to improve its declining scores. Both Claude 2 and GPT-4 demonstrated the ability to generate structured impressions, but Claude 2's specialized summarization capabilities provided an edge in maintaining accuracy without continuous feedback. The other LLMs' performance was subpar. The findings of this study suggest that LLMs can be a valuable tool in automating the generation of radiology impressions. Claude 2, in particular, exhibited promising results, indicating its potential for clinical implementation. However, the study also points to the necessity for further research, especially in optimizing model performance and evaluating real-world applicability.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.