What are the Desired Characteristics of Calibration Sets? Identifying Correlates on Long Form Scientific Summarization.

Griffin Adams,Griffin Adams,Anna Ostropolets,Jake Smith,Yuan-Jyue Chen,Noémie Elhadad,Tristan Naumann,Noémie Elhadad,Budhaditya Deb,Bichlien H Nguyen,Yingce Xia,Shufang Xie

doi:10.18653/v1/2023.acl-long.587

Griffin Adams, Griffin Adams + Show 10 more

Open Access

https://doi.org/10.18653/v1/2023.acl-long.587

Copy DOI

Abstract

Summarization models often generate text that is poorly calibrated to quality metrics because they are trained to maximize the likelihood of a single reference (MLE). To address this, recent work has added a calibration step, which exposes a model to its own ranked outputs to improve relevance or, in a separate line of work, contrasts positive and negative sets to improve faithfulness. While effective, much of this work has focused on how to generate and optimize these sets. Less is known about why one setup is more effective than another. In this work, we uncover the underlying characteristics of effective sets. For each training instance, we form a large, diverse pool of candidates and systematically vary the subsets used for calibration fine-tuning. Each selection strategy targets distinct aspects of the sets, such as lexical diversity or the size of the gap between positive and negatives. On three diverse scientific long-form summarization datasets (spanning biomedical, clinical, and chemical domains), we find, among others, that faithfulness calibration is optimal when the negative sets are extractive and more likely to be generated, whereas for relevance calibration, the metric margin between candidates should be maximized and surprise-the disagreement between model and metric defined candidate rankings-minimized. Code to create, select, and optimize calibration sets is available at https://github.com/griff4692/calibrating-summaries.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the conference. Association for Computational Linguistics. Meeting	Publication Date: Jan 1, 2023
Citations: 1	License type: cc-by

R Discovery Prime

R Discovery Prime

What are the Desired Characteristics of Calibration Sets? Identifying Correlates on Long Form Scientific Summarization.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the conference. Association for Computational Linguistics. Meeting

Lead the way for us

Similar Papers

What characterises funded biomedical research? Evidence from a basic and a clinical domain
Belén Álvarez-Bornstein ... María Bordons
Scientometrics | VOL. 119
Belén Álvarez-Bornstein, et. al.Belén Álvarez-Bornstein ... María Bordons
09 Mar 2019
Scientometrics | VOL. 119

A New Linear Logic for Deadlock-Free Session-Typed Processes
Ornela Dardha ... Simon J Gay
-
Ornela Dardha, et. al.Ornela Dardha ... Simon J Gay
01 Jan 2018
01 Jan 2018

Academic language and listening comprehension—Two sides of the same coin? An empirical examination of their dimensionality, relations to reading comprehension, and assessment modality.
Young-Suk Grace Kim ... Paola Uccelli
Journal of Educational Psychology | VOL. 112
Young-Suk Grace Kim, et. al.Young-Suk Grace Kim ... Paola Uccelli
01 Oct 2020
Journal of Educational Psychology | VOL. 112

The faithfulness of abstract protocol analysis
Joshua D Guttman ... Lenore D Zuck
-
Joshua D Guttman, et. al.Joshua D Guttman ... Lenore D Zuck
05 Nov 2001
05 Nov 2001

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

What are the Desired Characteristics of Calibration Sets? Identifying Correlates on Long Form Scientific Summarization.

Abstract

Talk to us

Similar Papers

More From: Proceedings of the conference. Association for Computational Linguistics. Meeting