Idiomatic Expression Identification using Semantic Compatibility

Ziheng Zeng,Suma Bhat

doi:10.1162/tacl_a_00442

Abstract

AbstractIdiomatic expressions are an integral part of natural language and constantly being added to a language. Owing to their non-compositionality and their ability to take on a figurative or literal meaning depending on the sentential context, they have been a classical challenge for NLP systems. To address this challenge, we study the task of detecting whether a sentence has an idiomatic expression and localizing it when it occurs in a figurative sense. Prior research for this task has studied specific classes of idiomatic expressions offering limited views of their generalizability to new idioms. We propose a multi-stage neural architecture with attention flow as a solution. The network effectively fuses contextual and lexical information at different levels using word and sub-word representations. Empirical evaluations on three of the largest benchmark datasets with idiomatic expressions of varied syntactic patterns and degrees of non-compositionality show that our proposed model achieves new state-of-the-art results. A salient feature of the model is its ability to identify idioms unseen during training with gains from 1.4% to 30.8% over competitive baselines on the largest dataset.

Highlights

Idiomatic expressions (IEs) are a special class of multi-word expressions (MWEs) that typically occur as collocations and exhibit semantic non-compositionality (a.k.a. semantic idiomaticity), where the meaning of the expression is not derivable from its parts (Baldwin and Kim, 2010)
DISC performs on par with RNN-MHCA and BERT-Bidirectional LSTM (BiLSTM)-conditional random field (CRF) in terms of F1 and Sequence accuracy (SA) for MAGPIE while outperforming all baselines using the other datasets
This is especially salient in the MAGPIE type-aware split where all the models achieve similar F1s, whereas DISC outperforms the others in terms of SA by margins ranging from 7% to 30.8% absolute points

Summary

Introduction

Idiomatic expressions (IEs) are a special class of multi-word expressions (MWEs) that typically occur as collocations and exhibit semantic non-compositionality (a.k.a. semantic idiomaticity), where the meaning of the expression is not derivable from its parts (Baldwin and Kim, 2010). Its span in a given sentence is localized and returning the phrase if it is used figuratively (i.e., used as an IE); otherwise an empty string is returned indicating that the phrase is used literally (see Table 1) Such a network can serve as a preprocessing step for broad-coverage downstream NLP applications because we consider the ability to detect IEs to be a first step towards their accurate processing. This is the idiomatic expression identification problem, which is the MWE identification problem defined by Baldwin and Kim (2010) limited to MWEs with semantic idiomaticity

Objectives

Methods

Results

Conclusion