BackgroundLarge language models (LLMs) offer opportunities to enhance radiological applications, but their performance in handling complex tasks remains insufficiently investigatedPurposeTo evaluate the performance of LLMs integrated with Contrast-enhanced Ultrasound Liver Imaging Reporting and Data System (CEUS LI-RADS) in diagnosing small (≤20mm) hepatocellular carcinoma (sHCC) in high-risk patients.Materials and MethodsFrom November 2014 to December 2023, high-risk HCC patients with untreated small (≤20mm) focal liver lesions (sFLLs), were included in this retrospective study. ChatGPT-4.0, ChatGPT-4o, ChatGPT-4o mini, and Google Gemini were integrated with imaging features from structured CEUS LI-RADS reports to assess their diagnostic performance for sHCC. The diagnostic efficacy of LLMs for small HCC were compared using McNemar test.ResultsThe final population consisted of 403 high-risk patients (52 years ± 11, 323 men). ChatGPT-4.0 and ChatGPT-4o demonstrated substantial to almost perfect intra-agreement for CEUS LI-RADS categorization (κ values: 0.76-1.0 and 0.7-0.94, respectively), outperforming ChatGPT-4o mini (κ values: 0.51-0.72) and Google Gemini (κ values: -0.04-0.47). ChatGPT-4.0 had higher sensitivity in detecting sHCC than ChatGPT-4o (83%-89% vs. 70%-78%, p < 0.02) with comparable specificity (76%-90% vs. 83%-86%, p > 0.05). Compared to human readers, ChatGPT-4.0 showed superior sensitivity (83%-89% vs. 63%-78%, p < 0.004) and comparable specificity (76%-90% vs. 90%-95%, p > 0.05) in diagnosing sHCC.ConclusionLLM integrated with CEUS LI-RADS offers potential tool in diagnosing sHCC for high-risk patients. ChatGPT-4.0 demonstrated satisfactory consistency in CEUS LI-RADS categorization, offering higher sensitivity in diagnosing sHCC while maintaining comparable specificity to that of human readers.
Read full abstract