Abstract

In a shift of medical education to a competency-based curriculum, practical examinations (PEs) are an effective but resource-intensive method of evaluating anatomy students. The short answer format of PEs requires evaluators familiar with the content to mark the exams. Moreover, the increasing transition to online anatomy courses could result in students losing the PE practice they would receive during in-person sessions. By virtue of the technical and close-ended nature of typical PE answers as well as grading usually being a binary ‘correct’ or ‘incorrect’ classification with no partial credit, it was hypothesized that the limited lexicon would allow for accurate grading using artificial intelligence disciplinessuch as natural language processing and decision trees (DTs). This research was done as a first step towards making an intelligent tutoring system for anatomy students. The study used the winter semester online PE results (n = 371) from McMaster University Faculty of Health Sciences’ anatomy and physiology course as the data set. For each of the 54 questions, a 10-fold cross-validation process was used where 90% of the answers (training set) trained the DT. After removing common words unrelated to correctness (“the”, “a”, “an”, etc.), each DT was comprised of unique words that appeared in student answers in a tree-like structure of nodes. Each node has an associated word as well as a correct/incorrect classification label and splits into sub-nodes (creating the tree-like structure). The remaining 10% of the answers (testing set), was marked by the generated DTs by traversing the tree starting from the top-most node. After traversing the tree, the classification label of the final node became the grade for the student's answer. Accuracy for each question was calculated as the number of proper classifications by the algorithm over the total number of answers. When the answer marked by the DT were compared to the answers marked by staff and faculty, the DT achieved an average of 94.49% accuracy in grading every non-blank student answer across all 54 questions. It was found that accuracy was negatively correlated to the number of unique words in the set of answers (-0.71, p<0.07), which was consistent with the initial hypothesis. As features such as spellchecking were not included in the algorithm to reduce the number of variables, the current results may underestimate the effectiveness of automated PE grading by DTs. The accuracy attained by the algorithms suggests that machine learning algorithms such as NLP and DTs may be used to reduce the workload of manual PE grading by instructional staff and mark a step towards developing an intelligent online PE tutoring system for anatomy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call