Introduction: Cardiovascular radiology reports contain valuable diagnostic information linked to images, but the unstructured text format makes feature extraction difficult on a large scale. Large language models (LLMs) allow for feature extraction where string parsing alone is insufficient, but require careful prompting for accurate results. Hypothesis: We hypothesize that a systematic prompting approach using LLMs can expedite the extraction of features from unstructured text in transesophageal echocardiography (TEE) reports. Methods: The data consisted of 7106 intraoperative TEE reports, 600 of which were manually reviewed to obtain pre- and post-intervention ground truth values for left ventricular ejection fraction (LVEF), right ventricular systolic function (RVSF), and tricuspid regurgitation (TR). Reports are paired with an imaging study consisting of 50-200 clips. For each feature considered, 100 of the 600 labeled reports were used to engineer a prompt in Llama-2 that maximized feature extraction accuracy. Results: We found that using multiple, shorter prompts yielded higher accuracy than did fewer, longer prompts. Additionally, when imposing semantic information onto a numerical scale, prompt engineering in combination with string parsing (Figure 1) gave the best results. When evaluated on the 500 labeled reports withheld for testing, the finalized prompts had accuracies of 94.1%, 94.8%, and 91.3% for LVEF, RVSF, and TR, respectively. Using this strategy, 5000 intraoperative TEE reports were analyzed and used to train and evaluate a regression model for LVEF estimation from TEE clips (Figure 2). Conclusion: We have shown that performing prompt engineering on Llama-2 can be used to extract features from unstructured TEE reports in an accurate manner. As an extension of these methods, automated feature prediction from echocardiograms can be used to create rapid, low-cost, and accessible cardiac assessments.
Read full abstract