Abstract

We introduce novel applications of the word embedding association test (WEAT) – a method for assessing differential biases and attitudes in word embeddings – for identifying correlations of human attitudes and behaviors with word embedding associations, and for automatically detecting words associated with a concept. We assess our methods by measuring the evolution of associations related to COVID-19, using survey data from the COVID States project as validation, along with a set of COVID-19 validation words developed based on surveys and sample responses created by expert psychologists studying COVID-19 behavior. We first show that word associations measured using the WEAT correlate with the behaviors and attitudes of the population which produced an embedding's training corpus. We take Pearson's <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\rho$</tex> between word embedding associations from a diachronic set of English-language word embeddings with COVID States survey data related to COVID-19 attitudes and behaviors. We find statistically significant correlations between WEAT associations and survey results for 19 of 23 survey questions, with Pearson's <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">$\rho$</tex> as high as .96. Survey responses for 10 questions correlate with WEAT associations in embeddings trained on Twitter data from several weeks prior to the survey. We also introduce the unipolar word embedding association test (U-WEAT), which measures strength of association with a single attribute word group, rather than between two opposing polar attribute groups. In an embedding trained on Twitter data from Oregon, the U-WEAT returns a positive effect size for 88% of validation words based on their association with a COVID-19 concept group, where less than 20% of the embedding vocabulary has a positive effect size, despite the prevalence of language related to COVID-19 during the time period in which the corpus was trained. A qualitative analysis of other words identified by the U-WEAT reveals a wide array of people, places, behaviors, and attitudes related to COVID-19.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call