A constrained optimization algorithm for learning GloVe embeddings with semantic lexicons

Flora Sakketou,Nicholas Ampazis

doi:10.1016/j.knosys.2020.105628

Abstract

GloVe representations of words as vector embeddings in continuous spaces are learned from matrix factorization of the words’ co-occurrences matrix constructed from large corpora. Due to their high quality as textual features, GloVe embeddings have been extensively utilized for many text mining and natural language processing tasks with considerable success. Further improvements of these word representations can be obtained by also taking into account the valuable information of the semantic properties of the words and the complex relationships between them as provided by semantic lexicons. In this paper we adopt optimization techniques from the domain of machine learning with constrained optimization in order to leverage the relational knowledge between words, and we propose an efficient algorithm that produces word embeddings enhanced by the semantic information. The proposed algorithm outperforms other related approaches that utilize semantic information either during training or as a post-processing step. Our claims are validated by experiments on popular text mining and natural language processing tasks, including word similarities, word analogies, and sentiment analysis, which demonstrate that our proposed model can significantly improve the quality of word vector representations.

Full Text