On enhancing the robustness of timeline summarization test collections

Richard Mccreadie,Shahzad Rajput,Ian Soboroff,Craig Macdonald,Iadh Ounis

doi:10.1016/j.ipm.2019.02.006

Richard Mccreadie, Shahzad Rajput + Show 3 more

Open Access

https://doi.org/10.1016/j.ipm.2019.02.006

Copy DOI

Journal: Information Processing and Management	Publication Date: Mar 25, 2019
Citations: 5	License type: publisher-specific-oa

Affiliation: University of Glasgow

Abstract

Timeline generation systems are a class of algorithms that produce a sequence of time-ordered sentences or text snippets extracted in real-time from high-volume streams of digital documents (e.g. news articles), focusing on retaining relevant and informative content for a particular information need (e.g. topic or event). These systems have a range of uses, such as producing concise overviews of events for end-users (human or artificial agents). To advance the field of automatic timeline generation, robust and reproducible evaluation methodologies are needed. To this end, several evaluation metrics and labeling methodologies have recently been developed - focusing on information nugget or cluster-based ground truth representations, respectively. These methodologies rely on human assessors manually mapping timeline items (e.g. sentences) to an explicit representation of what information a ‘good’ summary should contain. However, while these evaluation methodologies produce reusable ground truth labels, prior works have reported cases where such evaluations fail to accurately estimate the performance of new timeline generation systems due to label incompleteness. In this paper, we first quantify the extent to which the timeline summarization test collections fail to generalize to new summarization systems, then we propose, evaluate and analyze new automatic solutions to this issue. In particular, using a depooling methodology over 19 systems and across three high-volume datasets, we quantify the degree of system ranking error caused by excluding those systems when labeling. We show that when considering lower-effectiveness systems, the test collections are robust (the likelihood of systems being miss-ranked is low). However, we show that the risk of systems being mis-ranked increases as the effectiveness of systems held-out from the pool increases. To reduce the risk of mis-ranking systems, we also propose a range of different automatic ground truth label expansion techniques. Our results show that the proposed expansion techniques can be effective at increasing the robustness of the TREC-TS test collections, as they are able to generate large numbers missing matches with high accuracy, markedly reducing the number of mis-rankings by up to 50%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On enhancing the robustness of timeline summarization test collections

Abstract

Talk to us

Similar Papers

More From: Information Processing and Management

Lead the way for us

Similar Papers

Automatic Ground Truth Expansion for Timeline Evaluation
Richard Mccreadie ... Iadh Ounis
-
Richard Mccreadie, et. al.Richard Mccreadie ... Iadh Ounis
27 Jun 2018
27 Jun 2018

RISK MANAGEMENT IN A LARGE-SCALE NEW RAILWAY TRANSPORT SYSTEM PROJECT: Evaluation of Korean High Speed Railway Experience
Sunduck D Suh
IATSS Research | VOL. 24
Sunduck D SuhSunduck D Suh
01 Jan 1999
IATSS Research | VOL. 24

The Text REtrieval Conferences (TRECs): Providing a Test‐Bed for Information Retrieval Systems
Donna Harman
Bulletin of the American Society for Information Science and Technology | VOL. 24
Donna HarmanDonna Harman
01 Apr 1998
Bulletin of the American Society for Information Science and Technology | VOL. 24

Data driven cost-sensitive boosted tree for interpretable banking systemic risk prediction
Meng Xia ... Wanan Liu
Chaos, Solitons and Fractals: the interdisciplinary journal of Nonlinear Science, and Nonequilibrium and Complex Phenomena | VOL. 189
Meng Xia, et. al.Meng Xia ... Wanan Liu
25 Oct 2024
Chaos, Solitons and Fractals: the interdisciplinary journal of Nonlinear Science, and Nonequilibrium and Complex Phenomena | VOL. 189

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On enhancing the robustness of timeline summarization test collections

Abstract

Talk to us

Similar Papers

More From: Information Processing and Management