Medical students need to build a solid foundation of knowledge to become physicians. Clerkship is often considered the first transition point, and clerkship performance is essential for their development. We hope to identify subjects that could predict the clerkship performance, thus helping medical students learn more efficiently to achieve high clerkship performance. This cohort study collected background and academic data from medical students who graduated between 2011 and 2019. Prediction models were developed by machine learning techniques to identify the affecting features in predicting the pre-clerkship performance and clerkship performance. Following serial processes of data collection, data preprocessing before machine learning, and techniques and performance of machine learning, different machine learning models were trained and validated using the 10-fold cross-validation method. Thirteen subjects from the pre-med stage and 10 subjects from the basic medical science stage with an area under the ROC curve (AUC) >0.7 for either pre-clerkship performance or clerkship performance were found. In each subject category, medical humanities and sociology in social science, chemistry, and physician scientist-related training in basic science, and pharmacology, immunology-microbiology, and histology in basic medical science have predictive abilities for clerkship performance above the top tertile. Using a machine learning technique based on random forest, the prediction model predicted clerkship performance with 95% accuracy and 88% AUC. Clerkship performance was predicted by selected subjects or combination of different subject categories in the pre-med and basic medical science stages. The demonstrated predictive ability of subjects or categories in the medical program may facilitate students' understanding of how these subjects or categories of the medical program relate to their performance in the clerkship to enhance their preparedness for the clerkship.
Read full abstract