ICE: Information coverage estimate for automatic evaluation abstractive summaries

Daisy Monika Lal,Krishna Pratap Singh,Uma Shanker Tiwary

doi:10.1016/j.eswa.2021.116064

Abstract

Summaries are expected to relay the most amount of information in the least amount of words. Summary assessment tools such as Rouge, METEOR, and BLEU fail to explain the factual consistency of summaries with source documents and also disregard the synonymy between expressions. Information-coverage is a measure of the amount of important information retained in the summary. We propose ICE and ICE-T metrics that employ pre-trained word embeddings, Part-Of-Speech or POS-based keyword extraction, cosine-similarity, source length, and target length, for gauging the information-coverage in automatically shaped abstracts. We propose five POS-based keyword sampling techniques (NN, NN-VB, NN-JJ, NN-VB-JJ, and NN-VB-JJ-CD) for efficient retrieval of predominant information from the source text. Experiments show that: (1) ICE very strongly correlates with human-judgments w.r.t NN-VB-JJ-CD sampling (Pearson (0.989), Spearman rank (0.987) and Kendall rank (0.919)); and NN-VB-JJ sampling (Pearson (0.978), Spearman rank (0.979) and Kendall rank (0.899)) (2) ICE-T strongly correlates with human-judgments w.r.t NN-VB-JJ-CD sampling (Pearson (0.904), Spearman rank (0.846) and Kendall rank (0.702)); and NN-VB-JJ sampling (Pearson (0.877) and Spearman rank (0.832)). ICE and ICE-T are reliable information-coverage estimation tools as they equate eminently with human evaluations compared to Rouge (Rouge-1, Rouge-2, Rouge-L, Rouge-WE), and BLEU.

Full Text