Abstract
Learning by contrasting positive and negative samples is a general strategy adopted by many methods. Noise contrastive estimation (NCE) for word embeddings and translating embeddings for knowledge graphs are examples in NLP employing this approach. In this work, we view contrastive learning as an abstraction of all such methods and augment the negative sampler into a mixture distribution containing an adversarially learned sampler. The resulting adaptive sampler finds harder negative examples, which forces the main model to learn a better representation of the data. We evaluate our proposal on learning word embeddings, order embeddings and knowledge graph embeddings and observe both faster convergence and improved results on multiple metrics.
Highlights
Many models learn by contrasting losses on observed positive examples with those on some fictitious negative examples, trying to decrease some score on positive ones while increasing it on negative ones
To remedy the above mentioned problem of a fixed unconditional negative sampler, we propose to augment it into a mixture one, λpnce(y) + (1 − λ)gθ(y|x), where gθ is a conditional distribution with a learnable parameter θ and λ is a hyperparameter
We evaluate models trained from scratch as well as fine-tuned Glove models (Pennington et al, 2014) on word similarity tasks that consist of computing the similarity
Summary
Many models learn by contrasting losses on observed positive examples with those on some fictitious negative examples, trying to decrease some score on positive ones while increasing it on negative ones. In noise contrastive estimation for word embeddings, a negative example is formed by replacing a component of a positive pair by randomly selecting a sampled word from the vocabulary, resulting in a fictitious word-context pair which would be unlikely to exist in the dataset This negative sampling by corruption approach is used in learning knowledge graph embeddings (Bordes et al, 2013; Lin et al, 2015; Ji et al, 2015; Wang et al, 2014; Trouillon et al, 2016; Yang et al, 2014; Dettmers et al, 2017), order embeddings (Vendrov et al, 2016), caption generation (Dai and Lin, 2017), etc. We demonstrate the efficacy and generality of the proposed method on three different learning tasks, word embeddings (Mikolov et al, 2013), order embeddings (Vendrov et al, 2016) and knowledge graph embeddings (Ji et al, 2015)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.