VAST: The Valence-Assessing Semantics Test for Contextualizing Language Models

Robert Wolfe,Aylin Caliskan

doi:10.1609/aaai.v36i10.21400

Abstract

We introduce VAST, the Valence-Assessing Semantics Test, a novel intrinsic evaluation task for contextualized word embeddings (CWEs). Despite the widespread use of contextualizing language models (LMs), researchers have no intrinsic evaluation task for understanding the semantic quality of CWEs and their unique properties as related to contextualization, the change in the vector representation of a word based on surrounding words; tokenization, the breaking of uncommon words into subcomponents; and LM-specific geometry learned during training. VAST uses valence, the association of a word with pleasantness, to measure the correspondence of word-level LM semantics with widely used human judgments, and examines the effects of contextualization, tokenization, and LM-specific geometry. Because prior research has found that CWEs from OpenAI's 2019 English-language causal LM GPT-2 perform poorly on other intrinsic evaluations, we select GPT-2 as our primary subject, and include results showing that VAST is useful for 7 other LMs, and can be used in 7 languages. GPT-2 results show that the semantics of a word are more similar to the semantics of context in layers closer to model output, such that VAST scores diverge between our contextual settings, ranging from Pearson’s rho of .55 to .77 in layer 11. We also show that multiply tokenized words are not semantically encoded until layer 8, where they achieve Pearson’s rho of .46, indicating the presence of an encoding process for multiply tokenized words which differs from that of singly tokenized words, for which rho is highest in layer 0. We find that a few neurons with values having greater magnitude than the rest mask word-level semantics in GPT-2’s top layer, but that word-level semantics can be recovered by nullifying non-semantic principal components: Pearson’s rho in the top layer improves from .32 to .76. Downstream POS tagging and sentence classification experiments indicate that the GPT-2 uses these principal components for non-semantic purposes, such as to represent sentence-level syntax relevant to next-word prediction. After isolating semantics, we show the utility of VAST for understanding LM semantics via improvements over related work on four word similarity tasks, with a score of .50 on SimLex-999, better than the previous best of .45 for GPT-2. Finally, we show that 8 of 10 WEAT bias tests, which compare differences in word embedding associations between groups of words, exhibit more stereotype-congruent biases after isolating semantics, indicating that non-semantic structures in LMs also mask social biases.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

VAST: The Valence-Assessing Semantics Test for Contextualizing Language Models

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 5

Similar Papers

Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases
Wei Guo ... Aylin Caliskan
-
Wei Guo, et. al.Wei Guo ... Aylin Caliskan
21 Jul 2021
21 Jul 2021

Comparing general and specialized word embeddings for biomedical named entity recognition.
Rigo E Ramos-Vargas ... Sulema Torres-Ramos
PeerJ Computer Science | VOL. 7
Rigo E Ramos-Vargas, et. al.Rigo E Ramos-Vargas ... Sulema Torres-Ramos
18 Feb 2021
PeerJ Computer Science | VOL. 7

Synergizing Unsupervised and Supervised Learning: A Hybrid Approach for Accurate Natural Language Task Modeling
Wrick Talukdar ... Anjanava Biswas
International Journal of Innovative Science and Research Technology (IJISRT) | VOL. -
Wrick Talukdar, et. al.Wrick Talukdar ... Anjanava Biswas
03 Jun 2024
International Journal of Innovative Science and Research Technology (IJISRT) | VOL. -

MenuNER: Domain-Adapted BERT Based NER Approach for a Domain with Limited Dataset and Its Application to Food Menu Domain
Muzamil Hussain Syed ... Sun-Tae Chung
Applied Sciences | VOL. 11
Muzamil Hussain Syed, et. al.Muzamil Hussain Syed ... Sun-Tae Chung
28 Jun 2021
Applied Sciences | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

VAST: The Valence-Assessing Semantics Test for Contextualizing Language Models

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence