ChatGPT for Sample-Size Calculation in Sports Medicine and Exercise Sciences: A Cautionary Note.

Jabeur Methnani,Helmi Ben Saad,Imed Latiri,Ismail Dergaa,Karim Chamari

doi:10.1123/ijspp.2023-0109

Abstract

To investigate the accuracy of ChatGPT (Chat generative pretrained transformer), a large language model, in calculating sample size for sport-sciences and sports-medicine research studies. We conducted an analysis on 4 published papers (ie,examples 1-4) encompassing various study designs and approaches for calculating sample size in 3 sport-science and -medicine journals, including 3 randomized controlled trials and 1 survey paper. We provided ChatGPT with all necessary data such as mean, percentage SD, normal deviates (Zα/2 and Z1-β), and study design. Prompting from 1 example has subsequently been reused to gain insights into the reproducibility of the ChatGPT response. ChatGPT correctly calculated the sample size for 1 randomized controlled trial but failed in the remaining 3 examples, including the incorrect identification of the formula in one example of a survey paper. After interaction with ChatGPT, the correct sample size was obtained for the survey paper. Intriguingly, when the prompt from Example 3 was reused, ChatGPT provided a completely different sample size than its initial response. While the use of artificial-intelligence tools holds great promise, it should be noted that it might lead to errors and inconsistencies in sample-size calculations even when the tool is fed with the necessary correct information. As artificial-intelligence technology continues to advance and learn from human feedback, there is hope for improvement in sample-size calculation and other research tasks. However, it is important for scientists to exercise caution in utilizing these tools. Future studies should assess more advanced/powerful versions of this tool (ie,ChatGPT4).

Full Text