Abstract

Word vector and topic model can help retrieve information semantically. However, there still are many problems: 1) antonyms share high similarity when clustered through word vectors; 2) vectors for name entities cannot be fully trained, as name entities may appear limitied times in specific corpus; and 3) words, sentences, and paragraphs, sharing the same meaning but with no overlapping words, are hard to be recognized. To overcome the above problems, this paper proposes a new vector computation model for text named s2v. Words, sentences, and paragraphs are represented in a unified way in the model. Sentence vectors and paragraph vectors are trained along with word vectors. Based on the unified representation, word and sentence (with different length) retrieval are experimentally studied. The results show that information with similar meaning can be retrieved even if the information is expressed with different words.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call