A Transformer Based Approach towards Identification of Discourse Unit Segments and Connectives

Sahil Bakshi,Dipti Sharma

doi:10.18653/v1/2021.disrpt-1.2

Abstract

Discourse parsing, which involves understanding the structure, information flow, and modeling the coherence of a given text, is an important task in natural language processing. It forms the basis of several natural language processing tasks such as question-answering, text summarization, and sentiment analysis. Discourse unit segmentation is one of the fundamental tasks in discourse parsing and refers to identifying the elementary units of text that combine to form a coherent text. In this paper, we present a transformer based approach towards the automated identification of discourse unit segments and connectives. Early approaches towards segmentation relied on rule-based systems using POS tags and other syntactic information to identify discourse segments. Recently, transformer based neural systems have shown promising results in this domain. Our system, SegFormers, employs this transformer based approach to perform multilingual discourse segmentation and connective identification across 16 datasets encompassing 11 languages and 3 different annotation frameworks. We evaluate the system based on F1 scores for both tasks, with the best system reporting the highest F1 score of 97.02% for the treebanked English RST-DT dataset.

Highlights

In the Penn Discourse TreeBank (PDTB) framework, the segmentation task corresponds to identifying the spans of discourse connectives that explicitly identify the presence of a discourse relation
The PDTB framework consists of labels that mark the entire span of discourse connectives that explicitly identify the existence of a discourse relation
The final precision, recall and F1 are quite higher than the recall (Basque dataset 95% precision and 61% recall, Russian dataset 84% precision and 60% recall), indicating that the model is primarily aiming for the generic discourse unit boundary detection at the beginning of the discourse segments

Summary

Datasets

We describe the datasets provided by the organizers of the CODI-DISRPT2021: Discourse Relation Parsing and Treebanking Shared Task at EMNLP 20211. The data provided consists of 16 datasets comprising of 11 languages (German, English, Basque, Persian, French, Dutch, Portuguese, Russian, Spanish, Turkish, and Mandarin Chinese). This is the first iteration of the Persian RST corpus (Shahmohammadi et al, 2021) being included for the task of discourse segmentation. The Chinese PDTB dataset (Zhou and Xue, 2015) is not available freely. The organizers provided the scores on this dataset after running the. Model on the CDTB dataset during the evaluation phase

Annotation frameworks

Languages

System Overview

Bidirectional LSTM

SegFormers

Results

Conclusion and Future Work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Transformer Based Approach towards Identification of Discourse Unit Segments and Connectives

Abstract

Highlights

Summary

Talk to us

Similar Papers

Lead the way for us

Publication Date: Jan 1, 2021
Citations: 1	License type: cc-by

Similar Papers

Exploring Joint Neural Model for Sentence Level Discourse Parsing and Sentiment Analysis
Bita Nejat ... Giuseppe Carenini
-
Bita Nejat, et. al.Bita Nejat ... Giuseppe Carenini
01 Jan 2017
01 Jan 2017

A Sequential Model for Discourse Segmentation
Hugo Hernault ... Mitsuru Ishizuka
-
Hugo Hernault, et. al.Hugo Hernault ... Mitsuru Ishizuka
01 Jan 2009
01 Jan 2009

Word Embedding for Bengali Language using Domain-related Corpus
Ashutosh Bandyopadhyay ... Jayashree Nair
-
Ashutosh Bandyopadhyay, et. al.Ashutosh Bandyopadhyay ... Jayashree Nair
26 Apr 2023
26 Apr 2023

Multi-Task Text Classification using Graph Convolutional Networks for Large-Scale Low Resource Language
Mounika Marreddy ... Subba Reddy Oota
-
Mounika Marreddy, et. al.Mounika Marreddy ... Subba Reddy Oota
18 Jul 2022
18 Jul 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Transformer Based Approach towards Identification of Discourse Unit Segments and Connectives

Abstract

Highlights

Summary

Talk to us

Similar Papers