BACKGROUND CONTEXTSecure institutional large language models (LLM) could reduce the burden of non-interpretative tasks for radiologists. PURPOSEAssess the utility of a secure institutional LLM for MRI spine request form enhancement and auto-protocoling. STUDY DESIGN/SETTINGRetrospective study conducted from December 2023 to February 2024, including patients with clinical entries accessible on the electronic medical record (EMR). PATIENT SAMPLEOverall, 250 spine MRI request forms were analyzed from 218 patients (mean age = 55.9 years ± 18.9 [SD]; 108 women) across the cervical (n=56/250, 22.4%), thoracic (n=13/250, 5.2%), lumbar (n=166/250, 66.4%), and whole (n=15/250, 6.0%) spine. Of these, 60/250 (24.0%) required contrast and 41/250 (16.4%) had prior spine surgery/instrumentation. OUTCOME MEASURESPrimary–Adequacy of clinical information on clinician and LLM-augmented request forms were rated using a four-point scale. Secondary–Correct MRI protocol suggestion by LLM and first-year board-certified radiologists (Rad4 and Rad5) compared to a consensus reference standard. METHODSA secured institutional LLM (Claude 2.0) used a majority decision prompt (out of six runs) to enhance clinical information on clinician request forms using the EMR, and suggest the appropriate MRI protocol. The adequacy of clinical information on the clinician and LLM-augmented request forms was rated by three musculoskeletal radiologists independently (Rad1:10-years-experience; Rad2:12-years-experience; Rad3:10-years-experience). The same radiologists provided a consensus reference standard for the correct protocol, which was compared to the protocol suggested by the LLM and two first-year board-certified radiologists (Rad4 and Rad5). Overall agreement (Fleiss kappas for inter-rater agreement or % agreement with the reference standard and respective 95%CIs) were provided where appropriate. RESULTSLLM-augmented forms were rated by Rads 1–3 as having adequate clinical information in 93.6-96.0% of cases compared to 46.8-58.8% of the clinician request forms (p<0.01). Substantial interobserver agreement was observed with kappas of 0.71 (95%CI:0.67–0.76) for original forms and 0.66 (95%CI:0.61–0.72) for LLM-enhanced requests. Rads 1–3 showed almost perfect agreement on protocol decisions, with kappas of 0.99 (95%CI:0.94–1.0) for spine region selection, 0.93 (95%CI:0.86–1.0) for contrast necessity, and 0.93 (95%CI:0.86–0.99) for recognition of prior spine surgery. Compared to the consensus reference standard, the LLM suggested the correct protocol in 78.4% (196/250, p<0.01) of cases, albeit inferior to Rad4 (90.0%, p<0.01) and Rad5 (89.2%, p<0.01). The secure LLM did best in identifying spinal instrumentation in 39/41 (95.1%) cases, improved compared to Rad4 (61.0%) and Rad5 (41.5%) (both p<0.01). The secure LLM had high consistency with 227/250 cases (90.8%) having 100% (6/6 runs) agreement. CONCLUSIONSEnhancing spine MRI request forms with a secure institutional LLM improved the adequacy of clinical information. The LLM also accurately suggested the correct protocol in 78.4% of cases which could optimize the MRI workflow.
Read full abstract