Abstract

The development of a model to quantify semantic similarity and relatedness between words has been the major focus of many studies in various fields, e.g. psychology, linguistics, and natural language processing. Unlike the measures proposed by most previous research, this article is aimed at estimating automatically the strength of associative words that can be semantically related or not. We demonstrate that the performance of the model depends not only on the combination of independently constructed word embeddings (namely, corpus- and network-based embeddings) but also on the way these word vectors interact. The research concludes that the weighted average of the cosine-similarity coefficients derived from independent word embeddings in a double vector space tends to yield high correlations with human judgements. Moreover, we demonstrate that evaluating word associations through a measure that relies on not only the rank ordering of word pairs but also the strength of associations can reveal some findings that go unnoticed by traditional measures such as Spearman’s and Pearson’s correlation coefficients.

Highlights

  • Word associations have been a topic of intensive study in a variety of research fields, such as psychology, linguistics, and natural language processing (NLP)

  • As we introduced the possibility to tune RankDCG to assess word associations on rank ordering only or taking into consideration the associative strength, we managed to analyse the vector-space models generated by several word-embedding techniques through a different exploratory lens, going beyond the results provided by traditional measures

  • Our experiments showed that Word2Vec and GloVe expose the dominant influence of the semantic network through WALE-1 and that of the corpus through WALE-2, whereas the corpus dominates in both WALE models with FastText

Read more

Summary

Introduction

Word associations have been a topic of intensive study in a variety of research fields, such as psychology, linguistics, and natural language processing (NLP). Most studies of word priming have looked at pairs of words that are both associatively and semantically related. Participants can produce words as associates of other words that are not related in meaning; for example, waiting can be generated in response to hospital. There can be semantically related words that are not produced as associates; for example, dance and skate are related in meaning, but. C. Periñán‐Pascual skate is rarely produced as an associate of dance. Words can be associatively related, semantically related, or both of them

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call