ABSTRACT The quantitative and qualitative performance analysis of ChatGPT-3.0, a large language model, is carried out on three important and highly competitive examinations held in India: civil services examination (CSE, prelims), graduate aptitude test in engineering (GATE), and joint entrance examination (JEE). These examinations cover general knowledge, current affairs, history, geography, Indian polity, economics, mathematics, physics, chemistry, engineering, and technology aspects at the undergraduate and graduate levels. The Accuracy, Concordance, and Insight (ACI) criteria is used to analyze the performance of ChatGPT. ChatGPT passed CSE without much specialized training and reinforcement, however, underperformed in GATE and JEE. Overall, the average accuracy rate of ChatGPT is 48.71%, with a 44.45% concordance for all explanations. However, the concordance for accurate explanations is found to be 91.87% with a high level of insights given in the explanations. Moreover, the average accuracy of ChatGPT improves to 77.69% after training. The results suggest that large language models have great potential to assist with education technology and act as an instructor for the preparation of technical, aptitude and general studies topics for competitive examinations. Drawing insights from the findings of the current research, some limitations in the present study and possible future research directions are suggested.
Read full abstract