Introduction: With the growing development and adoption of artificial intelligence in healthcare and across other sectors of society, various user-friendly and engaging tools to support research have emerged, such as chatbots, notably ChatGPT. Objective: To investigate the performance of ChatGPT as an assistant to medical coders using the ICD-10-CM/PCS. Methodology: We conducted a prospective exploratory study between 2023 and 2024 over 6 months. A total of 150 clinical cases coded using the ICD-10-CM/PCS, extracted from technical coding books, were systematically randomized. All cases were translated into Portuguese (the native language of the authors) and English (the native language of the ICD-10-CM/PCS). These clinical cases varied in complexity levels regarding the quantity of diagnoses and procedures, as well as the nature of the clinical information. Each case was input into the 2023 ChatGPT free version. The coding obtained from ChatGPT was analyzed by a senior medical auditor/coder and compared with the expected results. Results: Regarding the correct codes, ChatGPT’s performance was higher by approximately 29 percentage points between diagnoses and procedures, with greater proficiency in diagnostic codes. The accuracy rate for codes was similar across languages, with rates of 31.0% and 31.9%. The error rate in procedure codes was substantially higher than that in diagnostic codes by almost four times. For missing information, a higher incidence was observed in diagnoses compared to procedures of slightly more than double the comparative rates. Additionally, there was a statistically significant excess of codes not related to clinical information, which was higher in procedures and nearly the same value in both languages under study. Conclusion: Given the ease of access to these tools, this investigation serves as an awareness factor, demonstrating that ChatGPT can assist the medical coder in directed research. However, it does not replace their technical validation in this process. Therefore, further developments of this tool are necessary to increase the quality and reliability of the results.
Read full abstract