Abstract
BackgroundThis study aimed to use natural language processing to predict the presence of intra-abdominal injury using unstructured data from electronic medical records. MethodsThis was a random-sample retrospective observational cohort study leveraging unstructured data from injured patients taken to one of 9 acute care hospitals in an integrated health system between 2015 and 2021. Patients with International Classification of Diseases External Cause of Morbidity codes were identified. History and physical, consult, progress, and radiology report text from the first 8 hours of care were abstracted. Annotator dyads independently annotated encounters’ text files to establish ground truth regarding whether intra-abdominal injury occurred. Features were extracted from text using natural language processing techniques, bag of words, and principal component analysis. We tested logistic regression, random forests, and gradient boosting machine to determine accuracy, recall, and precision of natural language processing to predict intra-abdominal injury. ResultsA random sample of 7,000 patient encounters of 177,127 was annotated. Only 2,951 had sufficient information to determine whether an intra-abdominal injury was present. Among those, 84 (2.9%) had an intra-abdominal injury. The concordance between annotators was 0.989. Logistic regression of features identified with bag of words and principal component analysis had the best predictive ability, with an area under the receiver operating characteristic curve of 0.9, recall of 0.73, and precision of 0.17. Text features with greatest importance included “abdomen,” “pelvis,” “spleen,” and “hematoma.” ConclusionNatural language processing could be a screening decision support tool, which, if paired with human clinical assessment, can maximize precision of intra-abdominal injury identification.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have