Abstract

Reviewing residency application narrative components is time intensive and has contributed to nearly half of applications not receiving holistic review. The authors developed a natural language processing (NLP)-based tool to automate review of applicants' narrative experience entries and predict interview invitation. Experience entries (n = 188,500) were extracted from 6,403 residency applications across 3 application cycles (2017-2019) at 1 internal medicine program, combined at the applicant level, and paired with the interview invitation decision (n = 1,224 invitations). NLP identified important words (or word pairs) with term frequency-inverse document frequency, which were used to predict interview invitation using logistic regression with L1 regularization. Terms remaining in the model were analyzed thematically. Logistic regression models were also built using structured application data and a combination of NLP and structured data. Model performance was evaluated on never-before-seen data using area under the receiver operating characteristic and precision-recall curves (AUROC, AUPRC). The NLP model had an AUROC of 0.80 (vs chance decision of 0.50) and AUPRC of 0.49 (vs chance decision of 0.19), showing moderate predictive strength. Phrases indicating active leadership, research, or work in social justice and health disparities were associated with interview invitation. The model's detection of these key selection factors demonstrated face validity. Adding structured data to the model significantly improved prediction (AUROC 0.92, AUPRC 0.73), as expected given reliance on such metrics for interview invitation. This model represents a first step in using NLP-based artificial intelligence tools to promote holistic residency application review. The authors are assessing the practical utility of using this model to identify applicants screened out using traditional metrics. Generalizability must be determined through model retraining and evaluation at other programs. Work is ongoing to thwart model "gaming," improve prediction, and remove unwanted biases introduced during model training.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call