Towards syntax-aware token embeddings

Diana Nicoleta Popa,James Henderson,Julien Perez,Eric Gaussier

doi:10.1017/s1351324920000297

Abstract

AbstractDistributional semantic word representations are at the basis of most modern NLP systems. Their usefulness has been proven across various tasks, particularly as inputs to deep learning models. Beyond that, much work investigated fine-tuning the generic word embeddings to leverage linguistic knowledge from large lexical resources. Some work investigated context-dependent word token embeddings motivated by word sense disambiguation, using sequential context and large lexical resources. More recently, acknowledging the need for an in-context representation of words, some work leveraged information derived from language modelling and large amounts of data to induce contextualised representations. In this paper, we investigate Syntax-Aware word Token Embeddings (SATokE) as a way to explicitly encode specific information derived from the linguistic analysis of a sentence in vectors which are input to a deep learning model. We propose an efficient unsupervised learning algorithm based on tensor factorisation for computing these token embeddings given an arbitrary graph of linguistic structure. Applying this method to syntactic dependency structures, we investigate the usefulness of such token representations as part of deep learning models of text understanding. We encode a sentence either by learning embeddings for its tokens and the relations between them from scratch or by leveraging pre-trained relation embeddings to infer token representations. Given sufficient data, the former is slightly more accurate than the latter, yet both provide more informative token embeddings than standard word representations, even when the word representations have been learned on the same type of context from larger corpora (namely pre-trained dependency-based word embeddings). We use a large set of supervised tasks and two major deep learning families of models for sentence understanding to evaluate our proposal. We empirically demonstrate the superiority of the token representations compared to popular distributional representations of words for various sentence and sentence pair classification tasks.

Full Text