Optimal Large Language Model Characteristics to Balance Accuracy and Energy Use for Sustainable Medical Applications.

Florence X Doo,Dharmam Savani,Adway Kanhere,Ruth C Carlos,Anupam Joshi,Paul H Yi,Vishwa S Parekh

doi:10.1148/radiol.240320

Abstract

Background Large language models (LLMs) for medical applications use unknown amounts of energy, which contribute to the overall carbon footprint of the health care system. Purpose To investigate the tradeoffs between accuracy and energy use when using different LLM types and sizes for medical applications. Materials and Methods This retrospective study evaluated five different billion (B)-parameter sizes of two open-source LLMs (Meta's Llama 2, a general-purpose model, and LMSYS Org's Vicuna 1.5, a specialized fine-tuned model) using chest radiograph reports from the National Library of Medicine's Indiana University Chest X-ray Collection. Reports with missing demographic information and missing or blank files were excluded. Models were run on local compute clusters with visual computing graphic processing units. A single-task prompt explained clinical terminology and instructed each model to confirm the presence or absence of each of the 13 CheXpert disease labels. Energy use (in kilowatt-hours) was measured using an open-source tool. Accuracy was assessed with 13 CheXpert reference standard labels for diagnostic findings on chest radiographs, where overall accuracy was the mean of individual accuracies of all 13 labels. Efficiency ratios (accuracy per kilowatt-hour) were calculated for each model type and size. Results A total of 3665 chest radiograph reports were evaluated. The Vicuna 1.5 7B and 13B models had higher efficiency ratios (737.28 and 331.40, respectively) and higher overall labeling accuracy (93.83% [3438.69 of 3665 reports] and 93.65% [3432.38 of 3665 reports], respectively) than that of the Llama 2 models (7B: efficiency ratio of 13.39, accuracy of 7.91% [289.76 of 3665 reports]; 13B: efficiency ratio of 40.90, accuracy of 74.08% [2715.15 of 3665 reports]; 70B: efficiency ratio of 22.30, accuracy of 92.70% [3397.38 of 3665 reports]). Vicuna 1.5 7B had the highest efficiency ratio (737.28 vs 13.39 for Llama 2 7B). The larger Llama 2 70B model used more than seven times the energy of its 7B counterpart (4.16 kWh vs 0.59 kWh) with low overall accuracy, resulting in an efficiency ratio of only 22.30. Conclusion Smaller fine-tuned LLMs were more sustainable than larger general-purpose LLMs, using less energy without compromising accuracy, highlighting the importance of LLM selection for medical applications. © RSNA, 2024 Supplemental material is available for this article.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Optimal Large Language Model Characteristics to Balance Accuracy and Energy Use for Sustainable Medical Applications.

Abstract

Talk to us

Similar Papers

More From: Radiology

Lead the way for us

Similar Papers

Assessing the research landscape and clinical utility of large language models: a scoping review.
Ye-Jean Park ... Christopher Naugler
BMC Medical Informatics and Decision Making | VOL. 24
Ye-Jean Park, et. al.Ye-Jean Park ... Christopher Naugler
12 Mar 2024
BMC Medical Informatics and Decision Making | VOL. 24

Automatic structuring of radiology reports with on-premise open-source large language models.
Piotr Woźnicki ... Fabian Christopher Laqua
European radiology | VOL. -
Piotr Woźnicki, et. al.Piotr Woźnicki ... Fabian Christopher Laqua
10 Oct 2024
European radiology | VOL. -

Unraveling the landscape of large language models: a systematic review and future perspectives
Qinxu Ding ... Chong Guan
Journal of Electronic Business & Digital Economics | VOL. 3
Qinxu Ding, et. al.Qinxu Ding ... Chong Guan
19 Dec 2023
Journal of Electronic Business & Digital Economics | VOL. 3

Clinical and Surgical Applications of Large Language Models: A Systematic Review.
Sophia M Pressman ... Antonio Jorge Forte
Journal of clinical medicine | VOL. 13
Sophia M Pressman, et. al.Sophia M Pressman ... Antonio Jorge Forte
22 May 2024
Journal of clinical medicine | VOL. 13

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimal Large Language Model Characteristics to Balance Accuracy and Energy Use for Sustainable Medical Applications.

Abstract

Talk to us

Similar Papers

More From: Radiology