227 Spine-tuned Natural Language Models and Bespoke Regular Expression Classifiers for Automated Spinal Surgery Registry Development

Daniel Alexander Alber,Chris Liu,Karl Lee Sangwon,David B Kurland,Lavender Jiang,Alexander Cheung,Eric Karl Oermann

doi:10.1227/neu.0000000000002809_227

Abstract

INTRODUCTION: Surgical research demands the development of clinical registries, often through time-intensive manual chart review. Natural language processing (NLP) may accelerate registry development, and an ideal automatic registry (autoregistry) algorithm would be highly accurate while requiring minimal manual data annotation. NLP approaches including bespoke Regular Expression (RegEx) classifiers and Large Language Models (LLM) possess distinct strengths and weaknesses and have not been compared in the setting of autoregistry development. METHODS: We used an institutional data lake to retrieve 31,502 neurosurgical operative notes. A standardized set of spinal procedures was chosen for inclusion in the autoregistry. 200 manually annotated notes were used for training and testing purposes. RegEx classifiers were engineered to retrieve procedural info from unprocessed notes. A family of 110-million parameter BERT models, including LLM pre-trained on clinical text, was fine-tuned for the same tasks. We also tested a open-source 7-billion parameter LLM chatbot, Vicuna, without fine-tuning. RESULTS: The RegEx classifiers were able to identify spinal procedures and associated vertebral levels in nearly 99% of operative notes. Fine-tuned LLM identified common procedures (e.g. spinal fusion and laminectomy) with greater than 95% accuracy but performed poorly for rarer procedures (e.g. XLIF, corpectomy) and vertebral body identification. Qualitative evaluation of the Vicuna chatbot showed potential for the same tasks, following iteratively refined prompting. CONCLUSIONS: The goal of autoregistry development is to minimize time- and labor-intensive manual chart review. We found that fine-tuned LLM could not match the accuracy and efficiency of the RegEx classifier. However, LLM may be well-suited to expand existing clinical databases that provide a robust training set. Further work combining NLP approaches will attempt to develop a pipeline for autoregistry development from natural language (plain English) queries.

Full Text