AbstractDef2Vec introduces a new perspective on building words embeddings by using dictionary definitions. By leveraging term-document matrices derived from dictionary definitions and employing Latent Semantic Analysis (LSA), our method, Def2Vec, yields embeddings characterized by robust performance and adaptability. Through comprehensive evaluations encompassing token classification, sequence classification and semantic similarity, we show empirically how Def2Vec consistently demonstrates competitiveness with established models like Word2Vec, GloVe, and FastText. Notably, our model’s utilization of all the matrices resulting from LSA factorisation facilitates efficient prediction of embeddings for out-of-vocabulary words, given their definition. By effectively integrating the benefits of dictionary definitions with LSA-based embeddings, Def2Vec builds informative semantic representations, all while minimizing data requirements. In this paper, we run different experiment to assess the quality of our embedding model at word level and at sequence level. Our findings contribute to the ongoing evolution of word embedding methodologies by incorporating structured lexical information and enabling efficient embedding prediction.
Read full abstract