Abstract

Until recently, human behavioral data from reading has mainly been of interest to researchers to understand human cognition. However, these human language processing signals can also be beneficial in machine learning-based natural language processing tasks. Using EEG brain activity for this purpose is largely unexplored as of yet. In this paper, we present the first large-scale study of systematically analyzing the potential of EEG brain activity data for improving natural language processing tasks, with a special focus on which features of the signal are most beneficial. We present a multi-modal machine learning architecture that learns jointly from textual input as well as from EEG features. We find that filtering the EEG signals into frequency bands is more beneficial than using the broadband signal. Moreover, for a range of word embedding types, EEG data improves binary and ternary sentiment classification and outperforms multiple baselines. For more complex tasks such as relation detection, only the contextualized BERT embeddings outperform the baselines in our experiments, which raises the need for further research. Finally, EEG data shows to be particularly promising when limited training data is available.

Highlights

  • Recordings of brain activity play an important role in furthering our understanding of how human language works (Murphy et al, 2018; Ling et al, 2019)

  • We investigate the effect of augmenting natural language processing (NLP) models with neurophysiolgical data in an extensive study while accounting for various dimensions: 1. We present a comparison of a purely data-driven approach of feature extraction for machine learning, using full broadband EEG signals, to a more theoretically motivated approach, splitting the word-level EEG features into frequency bands

  • The performance of our models is evaluated based on the comparison between the predicted labels (i.e., positive, neutral or negative sentiment for a sentence; or the relation type(s) in a sentence) and the true labels of the test set resulting in the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) across the classified samples

Read more

Summary

Introduction

Recordings of brain activity play an important role in furthering our understanding of how human language works (Murphy et al, 2018; Ling et al, 2019). The appeal and added value of using brain activity signals in linguistic research are intelligible (Stemmer and Connolly, 2012). Computational language processing models still struggle with basic linguistic phenomena that humans perform effortlessly (Ettinger, 2020). Numerous datasets of cognitive processing signals in naturalistic experiment paradigms with real-world language understanding tasks are becoming available (Alday, 2019; Kandylaki and Bornkessel-Schlesewsky, 2019). Linzen (2020) advocates for the grounding of NLP models in multi-modal settings to compare the generalization abilities of the models to human language learning. Developing models that learn from such multi-modal inputs

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.