Abstract Introduction/Objective Recent advancements in artificial intelligence (AI), particularly with Large Language Models like ChatGPT-4Vision (GPT-4V), have expanded the potential for automated interpretation of medical images. This study evaluates the diagnostic accuracy of GPT-4V in histopathological analysis of neurodegenerative diseases and compares its performance with traditional Convolutional Neural Networks (CNNs). Methods/Case Report We utilized 1515 histopathological images, including hematoxylin and eosin (H&E) staining and tau immunohistochemistry, from patients with various neurodegenerative diseases, such as Alzheimer’s disease, progressive supranuclear palsy, and corticobasal degeneration. GPT-4V’s performance was assessed through multi- step prompts to assess how textual context influences the image interpretation. In the quantitative analysis, the diagnostic accuracy of GPT-4V and the CNN-based model YOLOv8 was assessed for classifying three tau lesions—astrocytic plaques, neuritic plaques, and tufted astrocytes. The analysis used both zero-shot and few-shot learning methods, where GPT-4V and YOLOv8 were evaluated on their ability to distinguish these lesions using ten hold-out test images and 490 training images per lesion. Results (if a Case Study enter NA) GPT-4V accurately recognized Lewy bodies in all cases, but it showed limited accuracy in identifying neurofibrillary tangles (20%) and Pick bodies (0%) on H&E slides. Notably, GPT-4V often suggested Alzheimer’s disease as a potential diagnosis, relying more on contextual clues, such as the prevalence of certain diseases, rather than direct image analysis. In the tau immunohistochemical analysis, GPT-4V’s diagnoses were strongly influenced by additional information about staining and brain region. However, few-shot learning markedly improved GPT-4V’s diagnostic capabilities from 40% accuracy with zero-shot learning to 90% with few-shot learning, reaching best accuracy with 20-shot learning. This performance compared favorably against YOLOv8, which also achieved 90% accuracy but required 100-shot learning. Conclusion While GPT-4V faces challenges in independently interpreting histopathological images, few-shot learning significantly improves its accuracy. This approach is particularly promising for neuropathology, where acquiring extensive labeled datasets is challenging. The findings highlight the need for ongoing refinement of AI applications in pathology, emphasizing balanced integration of textual and visual data to optimize diagnostic efficacy and reliability.
Read full abstract