Institutions rely on student evaluations of teaching (SET) to ascertain teaching quality. Manual review of narrative comments can identify faculty with teaching concerns but can be resource and time-intensive. To determine if natural language processing (NLP) of SET comments completed by learners on clinical rotations can identify teaching quality concerns. Single institution retrospective cohort analysis of SET (n = 11,850) from clinical rotations between July 1, 2017, and June 30, 2018. The performance of three NLP dictionaries created by the research team was compared to an off-the-shelf Sentiment Dictionary. The Expert Dictionary had an accuracy of 0.90, a precision of 0.62, and a recall of 0.50. The Qualifier Dictionary had lower accuracy (0.65) and precision (0.16) but similar recall (0.67). The Text Mining Dictionary had an accuracy of 0.78 and a recall of 0.24. The Sentiment plus Qualifier Dictionary had good accuracy (0.86) and recall (0.77) with a precision of 0.37. NLP methods can identify teaching quality concerns with good accuracy and reasonable recall, but relatively low precision. An existing, free, NLP sentiment analysis dictionary can perform nearly as well as dictionaries requiring expert coding or manual creation.
Read full abstract