Abstract
To evaluate the effectiveness of two large language models, ChatGPT-4 and Claude 3, in improving the accuracy of question responses by senior sonologist and junior sonologist. A senior and a junior sonologist were given a practice exam. After answering the questions, they reviewed the responses and explanations provided by ChatGPT-4 and Claude 3. The accuracy and scores before and after incorporating the models' input were analyzed to compare their effectiveness. No statistically significant differences were found between the two models' responses scores for each section (all p>0.05). For junior sonologist, both ChatGPT-4 (p=0.039) and Claude 3 (p=0.039) significantly improved scores in basic knowledge. The responses provided by ChatGPT-4 also significantly improved scores in relevant professional knowledge (p=0.038), though their explanations did not (p=0.077). For all exam sections, both models' responses and explanations significantly improved scores (all p<0.05). For senior sonologist, both ChatGPT-4's responses (p=0.022) and explanations (p=0.034) improved scores in basic knowledge, as did Claude 3's explanations (p=0.003). Across all sections, Claude 3's explanations significantly improved scores (p=0.041). ChatGPT-4 and Claude 3 significantly improved sonologist' examination performance, particularly in basic knowledge.
Published Version
Join us for a 30 min session where you can share your feedback and ask us any queries you have