Abstract INTRODUCTION Large language models (LLM), such as chatGPT by OpenAI, exist at the intersection of machine learning and natural language processing, and are poised to transform healthcare with rapid growth in recent years. The ability of these models to process visual data has also advanced in recent years, with GPT-4 being the first model to publicly process visual data, and GPT-4o adding additional capabilities. There has been significant interest in the application of GPT-4o to healthcare image processing. OBJECTIVE This study sought to examine the ability of GPT-4o in determining a final diagnosis based on a single MRI image of an intracranial neoplasm. We then compared the performance to GPT-4, as well as to humans including neurosurgery residents, fellow, and attendings. METHODS GPT-4 and GPT-4o were given a standard prompt and an accompanying image requesting the most likely diagnosis for 20 brain MRI images, including 5 gliomas, 5 meningiomas, 5 pituitary tumors, and 5 sham non-tumor images. These were compared to a survey of neurosurgery attendings, fellows, senior residents, and junior residents. RESULTS GPT-4 supplied correct diagnoses in just 40% of the cases, while GPT-4o correctly identified the pathology in 70% of cases. Neurosurgeons yielded correct responses in 92% of cases, with neurosurgery senior residents faring the best (albeit with a small sample size). CONCLUSION GPT-4 and GPT-4o underperformed neurosurgery residents, fellows, and attendings in correctly identifying CNS malignancies. Still, with no additional training, GPT-4o yielded significantly better results than GPT-4, showing rapid advancement in the LLM over is predecessor in identifying intracranial neoplasms via single-image MRIs.
Read full abstract