Abstract

Linguistic experience varies across individuals and is impacted by both demography and personal preferences, leading to differences in word meanings across languages (Thompson et al., 2020) and people (Johns, 2022). An active area of study in the cognitive sciences that examines the impact of varied knowledge across individuals is the wisdom of the crowd effect, where it is found that the aggregate judgement of a group of individuals is often better than the judgement of the best individual in the group (Surowiecki, 2004). The goal of this article was to determine if there is a wisdom of the crowd effect in lexical semantic memory, such that the aggregated word similarity values from many individual language users exceeds the fit of the best fitting individual. This was accomplished by training 500 different distributional models from 500 high-level commenters on the internet forum Reddit. By deriving aggregated word similarity values from these individuals, a strong wisdom of the crowd effect was found where the aggregated similarity values far exceeded the performance of the best fitting individual for each dataset tested. Additionally, it was found that even aggregating only a small number of users provided a large increase in fit relative to the individual corpora, but with the best fitting measure including word similarity values from all possible users. The results of this article provide an avenue for future distributional model development by demonstrating that the best pathway towards better distributional models may lie in the aggregation of multiple representations attained from individual users of a language.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.