Abstract

This paper explores an unsupervised approach to learning a compositional representation function for multi-word expressions (MWEs), and evaluates it on the Tratz dataset, which associates two-word expressions with the semantic relation between the compound constituents (e.g. the label employer is associated with the noun compound government agency) (Tratz, 2011). The composition function is based on recurrent neural networks, and is trained using the Skip-Gram objective to predict the words in the context of MWEs. Thus our approach can naturally leverage large unlabeled text sources. Further, our method can make use of provided MWEs when available, but can also function as a completely unsupervised algorithm, using MWE boundaries predicted by a single, domain-agnostic part-of-speech pattern. With pre-defined MWE boundaries, our method outperforms the previous state-of-the-art performance on the coarse-grained evaluation of the Tratz dataset (Tratz, 2011), with an F1 score of 50.4%. The unsupervised version of our method approaches the performance of the supervised one, and even outperforms it in some configurations.

Highlights

  • Multi-word expressions (MWEs) are fundamental to language and, as such, having a robust semantic representation for MWEs is important for any natural language processing task that involves text understanding such as information extraction, or question answering

  • We propose an unsupervised method for learning a composition function capable of producing a representation of a MWE from the embeddings of its components, using bidirectional recurrent neural networks (RNNs)

  • While we found empirically that running the mapping function over the whole sentence to produce a context-aware embedding of the MWE performs better overall, we experimented with a variant that uses only the multi-word as input to an RNN

Read more

Summary

Introduction

Multi-word expressions (MWEs) are fundamental to language and, as such, having a robust semantic representation for MWEs is important for any natural language processing task that involves text understanding such as information extraction, or question answering (e.g., da Silva and Souza, 2012; Thurmair, 2018; Subramanian et al, 2018). Training a set of dedicated distributional embeddings for the new multi-word terms (Shwartz and Dagan, 2018; Dima, 2016) suffers from language sparsity. The MWE “red flower” is two orders of magnitude less frequent in Google search results than the noun “flower,” which is likely to affect the quality of its learned MWE representation. These methods have no straightforward way of handling MWEs that are out of vocabulary. Other approaches require supervision for MWE boundaries (Yu and Dredze, 2015), which hinders scalability and portability to different languages. The reliance on having determined your entities of interest ahead of time threatens to dramatically reduce the real-world utility of these approaches

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call