Abstract

Identifying word evolution is important for understanding cultural and societal changes. The key to accurate identification of word evolution is to distinguish word semantics. Recently methods based on low-dimensional embedding representation for words were proposed but they require the alignment of word embeddings across different time periods. This process is computationally expensive, prohibitively time consuming and suffering from contextual variability. In this paper, we propose a method to learn low-dimensional time-aware embeddings using both statistical and POS (part-of-speech) tagging information of words. Besides, the proposed method bypasses the computationally expensive step of aligning the word embeddings by tagging each word with a time prefix and encoding them into the common vector space. The learnt temporal embeddings better reveal semantic changes over time. A comprehensive experiment is conducted on Google Books N-gram (throughout 100 years). When compared with other three top-performing temporal embedding methods (PPMI, SVD, SGNS), our method achieves state-of-the-art in terms of time complexity, precision, recall, F1-score and the number of words identified to have changed in meaning, respectively. Additionally, we provide an intuitive illustration of the semantic evolution of the interesting words identified with our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call