Abstract

Using a random walk model of text generation, Arora et al. (2017) proposed a strong baseline for computing sentence embeddings: take a weighted average of word embeddings and modify with SVD. This simple method even outperforms far more complex approaches such as LSTMs on textual similarity tasks. In this paper, we first show that word vector length has a confounding effect on the probability of a sentence being generated in Arora et al.’s model. We propose a random walk model that is robust to this confound, where the probability of word generation is inversely related to the angular distance between the word and sentence embeddings. Our approach beats Arora et al.’s by up to 44.4% on textual similarity tasks and is competitive with state-of-the-art methods. Unlike Arora et al.’s method, ours requires no hyperparameter tuning, which means it can be used when there is no labelled data.

Highlights

  • Distributed representations of words, better known as word embeddings, have become fixtures of current methods in natural language processing

  • We first showed that word vector length has a confounding effect on the log-linear random walk model of generating text (Arora et al, 2017), the basis of a strong baseline method for sentence embeddings

  • We proposed an angular distance– based random walk model where the probability of a sentence being generated is robust to distortion from word vector length

Read more

Summary

Introduction

Distributed representations of words, better known as word embeddings, have become fixtures of current methods in natural language processing. Arora et al (2017) provided a more powerful approach: compute the sentence embeddings as weighted averages of word embeddings, subtract from each one the vector projection on their first principal component. A word unrelated to cs can be produced by chance or if it is part of frequent discourse such as stopwords This approach evens outperforms more complex models such as LSTMs on textual similarity tasks. Arora et al argued that the simplicity and effectiveness of their method make it a tough-to-beat baseline for sentence embeddings Though they call their approach unsupervised, others have noted that it is ‘weakly supervised’, since it requires hyperparameter tuning (Cer et al, 2017). Effectiveness, and unsupervised nature of our method, we suggest it be used as a baseline for computing sentence embeddings

Related Work
The Log-Linear Random Walk Model
The Confounding Effect of Vector Length
An Angular Distance–Based Random Walk Model
Textual Similarity Tasks
Experimental Settings
Results
Supervised Tasks
Future Work
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.