Objective With the rapid advancement of artificial intelligence (AI) technologies, models likeChat Generative Pre-Trained Transformer (ChatGPT) are increasingly being evaluated for their potential applications in healthcare. The Prescribing Safety Assessment (PSA) is a standardised test for junior physicians in the UK to evaluate prescribing competence. This study aims to assess ChatGPT's ability to pass the PSA and its performance across different exam sections. Methodology ChatGPT (version GPT-4) was tested on four official PSA practice papers, each containing 30 questions, in three independent trials per paper, with answers evaluated using official PSA mark schemes. Performance was measured by calculating overall percentage scores and comparing them to the pass marks provided for each practice paper. Subsection performance was also analysed to identify strengths and weaknesses. Results ChatGPT achieved mean scores of 257/300 (85.67%), 236/300 (78.67%), 199/300 (66.33%), and 233/300 (77.67%) across the four papers, consistently surpassing the pass marks where available. ChatGPT performed well in sections requiring factual recall, such as "Adverse Drug Reactions", scoring 63/72 (87.50%), and "Communicating Information", scoring 63/72 (88.89%). However, it struggled in "Data Interpretation", scoring 32/72 (44.44%), showing variability across trials and indicating limitations in handling more complex clinical reasoning tasks. Conclusion While ChatGPT demonstrated strong potential in passing the PSA and excelling in sections requiring factual knowledge, its limitations in data interpretation highlight the current gaps in AI's ability to fully replicate human clinical judgement. ChatGPT shows promise in supporting safe prescribing, particularly in areas prone to human error, such as drug interactions and communicating correct information. However, due to its variability in more complex reasoning tasks, ChatGPT is not yet ready to replace human prescribers and should instead serve as a supplemental tool in clinical practice.
Read full abstract