Semantic search as extractive paraphrase span detection

Jenna Kanerva,Teemu Vahtola,Hanna Kitti,Li-Hsin Chang,Filip Ginter,Mathias Creutz

doi:10.1007/s10579-023-09715-7

Abstract

AbstractIn this paper, we approach the problem of semantic search by introducing a task of paraphrase span detection, i.e. given a segment of text as a query phrase, the task is to identify its paraphrase in a given document, the same modelling setup as typically used in extractive question answering. While current work in paraphrasing has almost uniquely focused on sentence-level approaches, the novel span detection approach gives a possibility to retrieve a segment of arbitrary length. On the Turku Paraphrase Corpus of 100,000 manually extracted Finnish paraphrase pairs including their original document context, we find that by achieving an exact match of 88.73 our paraphrase span detection approach outperforms widely adopted sentence-level retrieval baselines (lexical similarity as well as BERT and SBERT sentence embeddings) by more than 20pp in terms of exact match, and 11pp in terms of token-level F-score. This demonstrates a strong advantage of modelling the paraphrase retrieval in terms of span extraction rather than commonly used sentence similarity, the sentence-level approaches being clearly suboptimal for applications where the retrieval targets are not guaranteed to be full sentences. Even when limiting the evaluation to sentence-level retrieval targets only, the span detection model still outperforms the sentence-level baselines by more than 4 pp in terms of exact match, and almost 6pp F-score. Additionally, we introduce a method for creating artificial paraphrase data through back-translation, suitable for languages where manually annotated paraphrase resources for training the span detection model are not available.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semantic search as extractive paraphrase span detection

Abstract

Talk to us

Similar Papers

More From: Language Resources and Evaluation

Lead the way for us

Journal: Language Resources and Evaluation	Publication Date: Feb 1, 2024
License type: CC BY 4.0

Similar Papers

Extractive Question Answering Using Transformer-Based LM
Raj Jha ... V Susheela Devi
-
Raj Jha, et. al.Raj Jha ... V Susheela Devi
01 Jan 2023
01 Jan 2023

OntoSeg: A Novel Approach to Text Segmentation Using Ontological Similarity
Mostafa Bayomi ... Seamus Lawless
-
Mostafa Bayomi, et. al.Mostafa Bayomi ... Seamus Lawless
01 Nov 2015
01 Nov 2015

Experimental Design of Extractive Question-Answering Systems: Influence of Error Scores and Answer Length
Amer Farea ... Frank Emmert-Streib
Journal of Artificial Intelligence Research | VOL. 80
Amer Farea, et. al.Amer Farea ... Frank Emmert-Streib
23 May 2024
Journal of Artificial Intelligence Research | VOL. 80

Exploiting synonymy to measure semantic similarity of sentences
Youhyun Shin ... Hyuntak Kim
-
Youhyun Shin, et. al.Youhyun Shin ... Hyuntak Kim
08 Jan 2015
08 Jan 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semantic search as extractive paraphrase span detection

Abstract

Talk to us

Similar Papers

More From: Language Resources and Evaluation