Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation

Ran Zhang,Jihed Ouni,Steffen Eger

doi:10.1162/coli_a_00519

Abstract

Abstract While summarization has been extensively researched in natural language processing (NLP), cross-lingual cross-temporal summarization (CLCTS) is a largely unexplored area that has the potential to improve cross-cultural accessibility and understanding. This article comprehensively addresses the CLCTS task, including dataset creation, modeling, and evaluation. We (1) build the first CLCTS corpus with 328 instances for hDe-En (extended version with 455 instances) and 289 for hEn-De (extended version with 501 instances), leveraging historical fiction texts and Wikipedia summaries in English and German; (2) examine the effectiveness of popular transformer end-to-end models with different intermediate fine-tuning tasks; (3) explore the potential of GPT-3.5 as a summarizer; and (4) report evaluations from humans, GPT-4, and several recent automatic evaluation metrics. Our results indicate that intermediate task fine-tuned end-to-end models generate bad to moderate quality summaries while GPT-3.5, as a zero-shot summarizer, provides moderate to good quality outputs. GPT-3.5 also seems very adept at normalizing historical text. To assess data contamination in GPT-3.5, we design an adversarial attack scheme in which we find that GPT-3.5 performs slightly worse for unseen source documents compared to seen documents. Moreover, it sometimes hallucinates when the source sentences are inverted against its prior knowledge with a summarization accuracy of 0.67 for plot omission, 0.71 for entity swap, and 0.53 for plot negation. Overall, our regression results of model performances suggest that longer, older, and more complex source texts (all of which are more characteristic for historical language variants) are harder to summarize for all models, indicating the difficulty of the CLCTS task. Regarding evaluation, we observe that both GPT-4 and BERTScore correlate moderately with human evaluations, implicating great potential for future improvement.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Computational Linguistics	Publication Date: Jul 9, 2024
Citations: 1	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation

Abstract

Talk to us

Similar Papers

More From: Computational Linguistics

Lead the way for us

Similar Papers

A Survey on Evaluation Metrics for Machine Translation
Seungjun Lee ... Seonmin Koo
Mathematics | VOL. 11
Seungjun Lee, et. al.Seungjun Lee ... Seonmin Koo
16 Feb 2023
Mathematics | VOL. 11

Human Versus Automatic Evaluation of NMT for Low-Resource Indian Language
Goutam Datta ... Kusum Gupta
-
Goutam Datta, et. al.Goutam Datta ... Kusum Gupta
01 Jan 2023
01 Jan 2023

ORANGE
Chin-Yew Lin ... Franz Josef Och
-
Chin-Yew Lin, et. al.Chin-Yew Lin ... Franz Josef Och
01 Jan 2004
01 Jan 2004

A comparative analysis of lexical-based automatic evaluation metrics for different Indic language pairs
Kiranjeet Kaur ... Shweta Chauhan
Journal of Autonomous Intelligence | VOL. 7
Kiranjeet Kaur, et. al.Kiranjeet Kaur ... Shweta Chauhan
02 Feb 2024
Journal of Autonomous Intelligence | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation

Abstract

Talk to us

Similar Papers

More From: Computational Linguistics