Abstract

Sentence compression is a natural language-processing task that produces a short paraphrase of an input sentence by deleting words from the input sentence while ensuring grammatical correctness and preserving meaningful core information. This study introduces a graph convolutional network (GCN) into a sentence compression task to encode syntactic information, such as dependency trees. As we upgrade the GCN to activate a directed edge, the compression model with the GCN layers can distinguish between parent and child nodes in a dependency tree when aggregating adjacent nodes. Furthermore, by increasing the number of GCN layers, the model can gradually collect high-order information of a dependency tree when propagating node information through the layers. We implement a sentence compression model for Korean and English, respectively. This model consists of three components: pre-trained BERT model, GCN layers, and a scoring layer. The scoring layer can determine whether a word should remain in a compressed sentence by relying on the word vector containing contextual and syntactic information encoded by BERT and GCN layers. To train and evaluate the proposed model, we used the Google sentence compression dataset for English and a Korean sentence compression corpus containing about 140,000 sentence pairs for Korean. The experimental results demonstrate that the proposed model achieves state-of-the-art performance for English. To the best of our knowledge, this sentence compression model based on the deep learning model trained with a large-scale corpus is the first attempt for Korean.

Highlights

  • Sentence compression is a natural language-processing (NLP) task where the primary objective is to generate a short paraphrase of an input sentence [1]

  • The compression ratio (CR) is the average number of characters of a compressed sentence divided by that of an original sentence, and MC is the CR of sentences compressed by a model minus the CR of a training set

  • This study introduced d-graph convolutional network (GCN) into a sentence compression task to represent a word with a dependency tree

Read more

Summary

Introduction

Sentence compression is a natural language-processing (NLP) task where the primary objective is to generate a short paraphrase of an input sentence [1]. Deletion-based compression is the most common approach to sentence compression—it is a sequence-labeling problem that determines whether each word should be retained or deleted from a source sentence to form a compressed sentence [2]. With this approach, only the remaining words are used to compose a compressed sentence. Only the remaining words are used to compose a compressed sentence Syntactic information, such as a syntactic tree, has been adopted as a feature in many sentence compression systems to maintain the grammaticality of a compressed sentence. The GNN has recently gained popularity in the dependency parsing research domain because it can appropriately represent a node of a dependency tree by aggregating its parent and child nodes and by gradually incorporating higher-order information of a dependency tree by collecting neighboring nodes

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.