Real-valued syntactic word vectors

A Basirat,J Nivre

doi:10.1080/0952813x.2019.1653385

A Basirat, J Nivre

Open Access

https://doi.org/10.1080/0952813x.2019.1653385

Copy DOI

Abstract

ABSTRACTWe introduce a word embedding method that generates a set of real-valued word vectors from a distributional semantic space. The semantic space is built with a set of context units (words) which are selected by an entropy-based feature selection approach with respect to the certainty involved in their contextual environments. We show that the most predictive context of a target word is its preceding word. An adaptive transformation function is also introduced that reshapes the data distribution to make it suitable for dimensionality reduction techniques. The final low-dimensional word vectors are formed by the singular vectors of a matrix of transformed data. We show that the resulting word vectors are as good as other sets of word vectors generated with popular word embedding methods.

Full Text