Abstract

Summaries are expected to relay the most amount of information in the least amount of words. Summary assessment tools such as Rouge, METEOR, and BLEU fail to explain the factual consistency of summaries with source documents and also disregard the synonymy between expressions. Information-coverage is a measure of the amount of important information retained in the summary. We propose ICE and ICE-T metrics that employ pre-trained word embeddings, Part-Of-Speech or POS-based keyword extraction, cosine-similarity, source length, and target length, for gauging the information-coverage in automatically shaped abstracts. We propose five POS-based keyword sampling techniques (NN, NN-VB, NN-JJ, NN-VB-JJ, and NN-VB-JJ-CD) for efficient retrieval of predominant information from the source text. Experiments show that: (1) ICE very strongly correlates with human-judgments w.r.t NN-VB-JJ-CD sampling (Pearson (0.989), Spearman rank (0.987) and Kendall rank (0.919)); and NN-VB-JJ sampling (Pearson (0.978), Spearman rank (0.979) and Kendall rank (0.899)) (2) ICE-T strongly correlates with human-judgments w.r.t NN-VB-JJ-CD sampling (Pearson (0.904), Spearman rank (0.846) and Kendall rank (0.702)); and NN-VB-JJ sampling (Pearson (0.877) and Spearman rank (0.832)). ICE and ICE-T are reliable information-coverage estimation tools as they equate eminently with human evaluations compared to Rouge (Rouge-1, Rouge-2, Rouge-L, Rouge-WE), and BLEU.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call