Background The rise of ChatGPT-4’s Data Analyst tool presents a new frontier for biostatistical computations. This study evaluates the reliability and improvements of ChatGPT-4 Data Analyst tool by comparing it to R package in performing biostatistical analysis on liver surgery patients. Methods Utilizing data from LiverGroup.org, we conducted our comparative study between October 2023 and March 2024. The variables analyzed by the R package and ChatGPT-4 Data Analyst included age, sex, hospital stay duration, income group, and mortality. Analysis on ChatGPT-4 were performed using two methods: a holistic prompt which included all-at-once analysis were requested and segmented prompts, one-by-one test request for analysis. After the analysis figures were requested from ChatGPT-4, comparison with R package figures was done. Results Descriptive analysis including N (%), Standard Deviation, and (25th–75th Percentile) were consistent between ChatGPT-4 March version and R with a minor variation in the holistic approach on the analysis performed in October. The inferential statistical results of ChatGPT-4 showed inconsistencies in October 2023 while March 2024 revealed accurate results with Crosstabulations, Kruskal Wallis, Wilcoxon Rank Sum, T-test, Pearson’s Chi-squared, and Fisher’s Exact test p-value. ChatGPT-4 March 2024 version was able to inform the user with possible inaccuracies in certain tests (Mann-Whitney U Test: Hospital stay vs mortality p value, Levene’s Test p-value: Age vs mortality, and Fisher’s Exact Test: Odds ratio gender vs. mortality 95% CI). The survival curve and box-and-whisker plot generated by ChatGPT-4 in March 2024 matched those generated by R package except for the CI of survival curve. Conclusions The high accuracy of ChatGPT-4 in certain biostatistical analysis has reached the point where it can replace established statistical software like R for some purposes. Artificial intelligence tools show significant promise but should still be used in conjunction with traditional methods to ensure precision in complex analysis. Consensus on the use of these tools is needed by the scientific community.
Read full abstract