Abstract

What is the first word that comes to your mind when you hear giraffe, or damsel, or freedom? Such free associations contain a huge amount of information on the mental representations of the corresponding concepts, and are thus an extremely valuable testbed for the evaluation of semantic representations extracted from corpora. In this paper, we present FAST (Free ASsociation Tasks), a free association dataset for English rigorously sampled from two standard free association norms collections (the Edinburgh Associative Thesaurus and the University of South Florida Free Association Norms), discuss two evaluation tasks, and provide baseline results. In parallel, we discuss methodological considerations concerning the desiderata for a proper evaluation of semantic representations.

Highlights

  • We present Free ASsociation Tasks (FAST), a new evaluation dataset for English lexical semantics designed to satisfy the desiderata highlighted above: (a) compatibility with humanlike generalization, (b) quality of distractors, and Assessing the performance of a distributional semantic model (DSM), be it a count model or one of the numerous and popular neural embeddings, has never been a straightforward endeavour

  • Another highly speculative explanation for the better performance of FastText is its use of subword embeddings, which may be sensitive to some rhyming effects that are are known to be present in free associations (Nelson et al, 2004)

  • The results on the multiple-choice task showed that first-order co-occurrence models outperform count-based DSMs, due to the known prevalence of syntagmatic relations in free associations

Read more

Summary

Introduction

We present Free ASsociation Tasks (FAST), a new evaluation dataset for English lexical semantics designed to satisfy the desiderata highlighted above: (a) compatibility with humanlike generalization, (b) quality of distractors, and Assessing the performance of a distributional semantic model (DSM), be it a count model or one of the numerous and popular neural embeddings, has never been a straightforward endeavour. It is becoming an increasingly pressing issue due to the ‘black box’ nature of word embeddings and their aleatoric, often irreproducible training. This paper makes the following contributions: manually classified the associates, identifying the we release FAST and propose two new evaluation following categories: Consecutive xy collocations: tasks based on it; we report preliminary modelling significant–other; Defining synonyms: significant–

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call