SPeC: A soft prompt-based calibration on performance variability of large language model in clinical notes summarization

Yu-Neng Chuang,Ruixiang Tang,Xiaoqian Jiang,Xia Hu

doi:10.1016/j.jbi.2024.104606

Abstract

Electronic health records (EHRs) store an extensive array of patient information, encompassing medical histories, diagnoses, treatments, and test outcomes. These records are crucial for enabling healthcare providers to make well-informed decisions regarding patient care. Summarizing clinical notes further assists healthcare professionals in pinpointing potential health risks and making better-informed decisions. This process contributes to reducing errors and enhancing patient outcomes by ensuring providers have access to the most pertinent and current patient data. Recent research has shown that incorporating instruction prompts with large language models (LLMs) substantially boosts the efficacy of summarization tasks. However, we show that this approach also leads to increased performance variance, resulting in significantly distinct summaries even when instruction prompts share similar meanings. To tackle this challenge, we introduce a model-agnostic Soft Prompt-BasedCalibration (SPeC) pipeline that employs soft prompts to lower variance while preserving the advantages of prompt-based summarization. Experimental findings on multiple clinical note tasks and LLMs indicate that our method not only bolsters performance but also effectively regulates variance across different LLMs, providing a more consistent and reliable approach to summarizing critical medical information.

Full Text