Abstract

Syntax has been shown useful for various NLP tasks, while existing work mostly encodes singleton syntactic tree using one hierarchical neural network. In this paper, we investigate a simple and effective method, Knowledge Distillation, to integrate heterogeneous structure knowledge into a unified sequential LSTM encoder. Experimental results on four typical syntax-dependent tasks show that our method outperforms tree encoders by effectively integrating rich heterogeneous structure syntax, meanwhile reducing error propagation, and also outperforms ensemble methods, in terms of both the efficiency and accuracy.

Highlights

  • Integrating syntactic information into neural networks has received increasing attention in natural language processing (NLP), which has been used for a wide range of end tasks, such as sentiment analysis (SA) (Nguyen and Shirai, 2015; Teng and Zhang, 2017; Looks et al, 2017; Zhang and Zhang, 2019), neural machine translation (NMT) (Cho et al, 2014; Garmash and Monz, 2015; Guet al., 2018), language modeling (Yazdani and Henderson, 2015; Zhang et al, 2016; Zhou et al, 2017), semantic role labeling (SRL) (Marcheggiani and Titov, 2017; Strubell et al, 2018; Fei et al, 2020c), natural language inference (NLI) (Tai et al, 2015a; Liu et al, 2018) and text classification (Chen et al, 2015; Zhang et al, 2018b)

  • We investigate the Knowledge Distillation (KD) method, which has been shown to be

  • We investigated knowledge distillation on heterogeneous tree structures integration for facilitating NLP tasks, distilling syntactic knowledge into a sequential input encoder, in both output and feature level distillations

Read more

Summary

Introduction

Integrating syntactic information into neural networks has received increasing attention in natural language processing (NLP), which has been used for a wide range of end tasks, such as sentiment analysis (SA) (Nguyen and Shirai, 2015; Teng and Zhang, 2017; Looks et al, 2017; Zhang and Zhang, 2019), neural machine translation (NMT) (Cho et al, 2014; Garmash and Monz, 2015; Guet al., 2018), language modeling (Yazdani and Henderson, 2015; Zhang et al, 2016; Zhou et al, 2017), semantic role labeling (SRL) (Marcheggiani and Titov, 2017; Strubell et al, 2018; Fei et al, 2020c), natural language inference (NLI) (Tai et al, 2015a; Liu et al, 2018) and text classification (Chen et al, 2015; Zhang et al, 2018b). Despite the usefulness of structure knowledge, most existing models use only a single syntactic tree, such as a constituency or a dependency tree. Constituent and dependency representation for syntactic structure share underlying linguistic and computational characteristics, while differ in various aspects. The former focuses S NP VP (1) NNP (2) A0 VBD visit..

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call