Sentence Vector Model Based on Implicit Word Vector Expression

Xinzhi Wang,Hui Zhang,Yi Liu

doi:10.1109/access.2018.2817839

Abstract

Word vector and topic model can help retrieve information semantically. However, there still are many problems: 1) antonyms share high similarity when clustered through word vectors; 2) vectors for name entities cannot be fully trained, as name entities may appear limitied times in specific corpus; and 3) words, sentences, and paragraphs, sharing the same meaning but with no overlapping words, are hard to be recognized. To overcome the above problems, this paper proposes a new vector computation model for text named s2v. Words, sentences, and paragraphs are represented in a unified way in the model. Sentence vectors and paragraph vectors are trained along with word vectors. Based on the unified representation, word and sentence (with different length) retrieval are experimentally studied. The results show that information with similar meaning can be retrieved even if the information is expressed with different words.

Full Text