PurposeAccurate risk stratification of thyroid nodules is essential for optimal patient management. This study aimed to assess the suitability of ChatGPT for risk stratification of thyroid nodules using a text-based evaluation. MethodsA dataset was compiled comprising 50 anonymized clinical reports and associated risk assessments for thyroid nodules. The Chat Generative Pre-trained Transformer (ChatGPT) was used to classify sonographic patterns in accordance with the Thyroid Imaging Reporting and Data System (TI-RADS). The model's performance was assessed using various criteria, including sensitivity, specificity, and accuracy. A comparative analysis was conducted, evaluating the model against investigator-based risk stratification as well as histology. ResultsWith an overall agreement rate of 42 % in comparison with examiner-based evaluation (TI-RADS 1–5), the results show that ChatGPT has moderate potential for predicting the risk of malignancy in thyroid nodules using text-based reports. The chatbot model achieved a sensitivity of 86.7 %, a specificity of 10.7 %, and an overall accuracy of 68 % when distinguishing between low-risk (TI-RADS 2 and 3) and high-risk (TI-RADS 4 and 5) categories. Interrater reliability was calculated with a Cohen's kappa of 0.686. ConclusionThis study highlights the potential of ChatGPT in assisting clinicians with risk stratification of thyroid nodules. The results suggest that ChatGPT can facilitate personalized treatment decisions, although the agreement rate is still low. Further research and validation studies are necessary to establish the clinical applicability and generalizability of ChatGPT in routine practice. The integration of ChatGPT into clinical workflows has the potential to enhance thyroid nodule risk assessment and improve patient care.
Read full abstract