Abstract
The methodology of neural word embeddings has become an important fundamental resource for tackling many applications in the artificial intelligence (AI) research field. They have successfully been proven to capture high-quality syntactic and semantic relationships in a vector space. Despite their significant impact, neural word embeddings have several disadvantages. In this paper, we focus on two issues regarding well-trained word embeddings: (i) the massive memory requirement and (ii) the inapplicability of out-of-vocabulary (OOV) words. To overcome these two issues, we propose a method of reconstructing pre-trained word embeddings by using subword information that can effectively represent a large number of subword embeddings in a considerably small fixed space while preventing quality degradation from the original word embeddings. The key techniques of our method are twofold: memory-shared embeddings and a variant of the key-value-query self-attention mechanism. Our experiments show that our reconstructed subword-based word embeddings can successfully imitate well-trained word embeddings in a small fixed space while preventing quality degradation across several linguistic benchmark datasets and can simultaneously predict effective embeddings of OOV words. We also demonstrate the effectiveness of our reconstruction method when it is applied to downstream tasks, such as named entity recognition and natural language inference tasks.
Highlights
M ACHINE-readable representation of word meanings is one of the essential tools for tackling natural language understanding by computers.A recent trend is to embed word meanings into a vector space by using the rapidly developing neural word embedding methods, such as Skip-gram [1], GloVe [2], and fastText [3]
The subwordbased approach can greatly mitigate the OOV word issue. We extend this approach to simultaneously reduce the total number of embedding vectors through the reconstruction of word embeddings by using subwords
We experimentally show that our reconstructed subwordbased embeddings can successfully imitate well-trained word embeddings, such as fastText.600B and GloVe.840B, in a small fixed space while preventing quality degradation across several linguistic benchmark datasets from word similarity and analogy tasks
Summary
A recent trend is to embed word meanings into a vector space by using the rapidly developing neural word embedding methods, such as Skip-gram [1], GloVe [2], and fastText [3]. The basic idea used to construct a vector space model is derived from the intuition that similar words tend to appear in similar contexts [4]. These methods have successfully been proven to capture high-quality syntactic and semantic relationships in a vector space. Studies in compositional semantics have revealed that the calculations underlying embedding vectors, such as addition and inner product, can be considered satisfactory approximations of the composed word meaning and the similarity between words, respectively [1]. Pre-trained word embeddings, especially those trained on a vast amount of text data, such as the Common
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE/ACM Transactions on Audio, Speech, and Language Processing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.