Abstract

Background With ChatGPT demonstrating impressive abilities in solving clinical vignettes and medical questions, there is still a lack of studies assessing ChatGPT using real patient data. With real-world cases offering added complexity, ChatGPT's utility in treatment using such data must be tested to better assess its accuracy and dependability. In this study, we compared a rural cardiologist's medication recommendations to that of GPT-4 for patients with lab review appointments. Methodology We reviewed the lab review appointments of 40 hypertension patients, noting their age, sex, medical conditions, medications and dosage, and current and past lab values. The cardiologist's medication recommendations (decreasing dose, increasing dose, stopping, or adding medications) from the most recent lab visit, if any, were recorded for each patient. Data collected from each patient was inputted into GPT-4 using a set prompt and the resulting medication recommendations from the model were recorded. Results Out of the 40 patients, 95% had conflicting overall recommendations between the physician and GPT-4, with only 10.2% of the specific medication recommendations matching between the two. Cohen's kappa coefficient was -0.0127, indicating no agreement between the cardiologist and GPT-4 for providing medication changes overall for a patient. Possible reasons for this discrepancy can be differing optimal lab value ranges, lack of holistic analysis by GPT-4, and a need for providing further supplementary information to the model. Conclusions The study findings showed a significant difference between the cardiologist's medication recommendations and that of ChatGPT-4. Future research should continue to test GPT-4 in clinical settings to validate its abilities in the real world where more intricacies and challenges exist.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call