Dear editor, In pursuit of a fresh outlook, my objective was to gauge the abilities of a multimodal large language model (LLM) against sample questions from the European Diploma in Breast Imaging (EDBI) test, an initiative by the European Society of Breast Imaging. Large language models are pushing the potential in radiology, from interpreting text and medical images to generating reports (Bhayana, 2024). Generative Pre-trained Transformer 4 (GPT-4) has notably passed a national mammography board exam with clarity (Almeida et al, 2024). As the latest version among multimodal LLM types, GPT-4 is capable of answering questions requiring both lower-order and higher-order thinking. Three written sample questions, where multiple choices could be correct, were evaluated. It was noted that there was no negative marking for incorrect answers (https://www.eusobi.org/european-diploma-in-breast-imaging-edbi/). The scoring system was adapted from the European Diploma in Radiology scoring guidelines (https://www.myebr.org/edir-scoring-faqs). Data was obtained from Google Gemini, GPT-3.5, and GPT-4 in March 2024. When assigning a value of 1 point to each question, GPT-4 reached an accuracy of 78%, GPT-3.5 achieved 50%, and Google Gemini scored 22.2%. This notable success in the sample questions from the EDBI particularly emphasizes GPT-4's potential in aiding clinical decision-making. Future studies may assess its performance in questions requiring medical image analysis, such as mammography, breast ultrasound, or breast magnetic resonance imaging.
Read full abstract