To assess the utility and performance of the large language model ChatGPT 4.0 regarding accuracy, completeness, and its potential as a time-saving tool for pathologists and laboratory directors. A deidentified database of questions previously sent to pathology residents from health care providers was used as a source of general knowledge-type pathology questions. These questions were submitted to the large language model and the responses graded by subject matter experts in different pathology subspecialties. The grading criteria assessed accuracy, completeness, and the potential time savings for helping the pathologist craft the response. Overall, respondents thought that most of the answers would take less than 5 minutes of additional work to be used (85%). Accuracy and completeness for the 61 questions was high, with 98% of responses being at least "completely or mostly accurate" and 82% of responses "containing all relevant information." Of the respondents, 97% stated that the response would have "zero or near-zero potential for medical harm," and all thought the tool had potential to save time in constructing answers to health care providers' queries. Performance was similar in both Anatomic Pathology (AP) and Clinical Pathology (CP), with the only exception being some relevant information was excluded in 46% of AP answers vs only 10% in CP (P < .01). ChatGPT version 4.0 gave responses that were predominantly accurate and complete for general knowledge-type pathology questions. With further research and when reviewed by a pathologist or laboratorian, this could facilitate its use as a pathologist's aid in answering questions from health care providers.
Read full abstract