Mimic and Conquer: Heterogeneous Tree Structure Distillation for Syntactic NLP

Hao Fei,Yafeng Ren,Donghong Ji

doi:10.18653/v1/2020.findings-emnlp.18

Abstract

Syntax has been shown useful for various NLP tasks, while existing work mostly encodes singleton syntactic tree using one hierarchical neural network. In this paper, we investigate a simple and effective method, Knowledge Distillation, to integrate heterogeneous structure knowledge into a unified sequential LSTM encoder. Experimental results on four typical syntax-dependent tasks show that our method outperforms tree encoders by effectively integrating rich heterogeneous structure syntax, meanwhile reducing error propagation, and also outperforms ensemble methods, in terms of both the efficiency and accuracy.

Highlights

Integrating syntactic information into neural networks has received increasing attention in natural language processing (NLP), which has been used for a wide range of end tasks, such as sentiment analysis (SA) (Nguyen and Shirai, 2015; Teng and Zhang, 2017; Looks et al, 2017; Zhang and Zhang, 2019), neural machine translation (NMT) (Cho et al, 2014; Garmash and Monz, 2015; Guet al., 2018), language modeling (Yazdani and Henderson, 2015; Zhang et al, 2016; Zhou et al, 2017), semantic role labeling (SRL) (Marcheggiani and Titov, 2017; Strubell et al, 2018; Fei et al, 2020c), natural language inference (NLI) (Tai et al, 2015a; Liu et al, 2018) and text classification (Chen et al, 2015; Zhang et al, 2018b)
We investigate the Knowledge Distillation (KD) method, which has been shown to be
We investigated knowledge distillation on heterogeneous tree structures integration for facilitating NLP tasks, distilling syntactic knowledge into a sequential input encoder, in both output and feature level distillations

Summary

Introduction

Integrating syntactic information into neural networks has received increasing attention in natural language processing (NLP), which has been used for a wide range of end tasks, such as sentiment analysis (SA) (Nguyen and Shirai, 2015; Teng and Zhang, 2017; Looks et al, 2017; Zhang and Zhang, 2019), neural machine translation (NMT) (Cho et al, 2014; Garmash and Monz, 2015; Guet al., 2018), language modeling (Yazdani and Henderson, 2015; Zhang et al, 2016; Zhou et al, 2017), semantic role labeling (SRL) (Marcheggiani and Titov, 2017; Strubell et al, 2018; Fei et al, 2020c), natural language inference (NLI) (Tai et al, 2015a; Liu et al, 2018) and text classification (Chen et al, 2015; Zhang et al, 2018b). Despite the usefulness of structure knowledge, most existing models use only a single syntactic tree, such as a constituency or a dependency tree. Constituent and dependency representation for syntactic structure share underlying linguistic and computational characteristics, while differ in various aspects. The former focuses S NP VP (1) NNP (2) A0 VBD visit..

Methods

Results

Conclusion