Abstract

Vector space models (VSMs) represent word meanings as points in a high dimensional space. VSMs are typically created using a large text corpora, and so represent word semantics as observed in text. We present a new algorithm (JNNSE) that can incorporate a measure of semantics not previously used to create VSMs: brain activation data recorded while people read words. The resulting model takes advantage of the complementary strengths and weaknesses of corpus and brain activation data to give a more complete representation of semantics. Evaluations show that the model 1) matches a behavioral measure of semantics more closely, 2) can be used to predict corpus data for unseen words and 3) has predictive power that generalizes across brain imaging technologies and across subjects. We believe that the model is thus a more faithful representation of mental vocabularies.

Highlights

  • Vector Space Models (VSMs) represent lexical meaning by assigning each word a point in high dimensional space

  • In this work we focus on the scientific question: Can the inclusion of brain data improve semantic representations learned from corpus data? What can we learn from such a model? From an engineering perspective, brain activation data will likely never replace text data

  • We extend NNSEs to incorporate an additional source of data for a subset of the words in X, and call the approach Joint Non-Negative Sparse Embeddings (JNNSEs)

Read more

Summary

Introduction

Vector Space Models (VSMs) represent lexical meaning by assigning each word a point in high dimensional space. The inclusion of brain data will only improve a text-based model if brain data contains semantic information not readily available in the corpus. 4. directly maps semantic concepts onto the brain by jointly learning neural representations Together, these results suggest that corpus and brain activation data measure semantics in compatible and complimentary ways. Our findings indicate that there is additional semantic information available in brain activation data that is not present in corpus data, and that there are elements of semantics currently lacking in text-based VSMs. We have made available the top performing VSMs created with brain and text data (http://www.cs.cmu.edu/ ̃afyshe/papers/acl2014/). We will describe the data used and the experiments to support our position that brain data is a valuable source of semantic information that compliments text data

Non-Negative Sparse Embedding
Joint Non-Negative Sparse Embedding
Related Work
Experimental Results
Word Prediction from Brain Activation
Prediction from a Brain-only Model
Effect on Rows Without Brain Data
Predicting Corpus Data
Future Work and Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call