Network embeddings from distributional thesauri for improving static word representations

Abhik Jana,Siddhant Haldar,Pawan Goyal

doi:10.1016/j.eswa.2021.115868

Abstract

Word representations obtained from text using the distributional hypothesis have proved to be useful for various natural language processing tasks. To prepare vector representation from the text, some researchers use predictive model (Word2vec) or dense count-based model (GloVe), whereas others attempt to explore network structure obtained from text namely, distributional thesaurus network where the neighborhood of a word is a set of words having adequate context feature overlap. Being inspired by the successful application of network embedding techniques (DeepWalk, LINE, node2vec, etc.) in various tasks, we attempt to apply network embedding techniques to turn a distributional thesaurus network into dense word vectors and investigate the usefulness of distributional thesaurus embedding in improving the overall word vector representation. This is the first attempt where we show that combining the proposed word representation obtained by distributional thesaurus embedding with the state-of-the-art word representations helps in improving the performance by a significant margin when evaluated against several NLP tasks which include intrinsic tasks like word similarity and relatedness, subspace alignment, synonym detection, analogy detection; extrinsic tasks like noun compound interpretation, sentence pair similarity task as well as subconscious intrinsic evaluation methods using neural activation pattern in the brain, etc.

Full Text