ZuCo, a simultaneous EEG and eye-tracking resource for natural sentence reading

Nora Hollenstein,Andreas Pedroni,Jonathan Rotsztejn,Ce Zhang,Nicolas Langer,Marius Troendle

doi:10.1038/sdata.2018.291

Nora Hollenstein, Andreas Pedroni + Show 4 more

Open Access

https://doi.org/10.1038/sdata.2018.291

Copy DOI

Journal: Scientific Data	Publication Date: Dec 1, 2018
Citations: 72	License type: open-access

Affiliation: ETH Zurich, University of Zurich

Abstract

We present the Zurich Cognitive Language Processing Corpus (ZuCo), a dataset combining electroencephalography (EEG) and eye-tracking recordings from subjects reading natural sentences. ZuCo includes high-density EEG and eye-tracking data of 12 healthy adult native English speakers, each reading natural English text for 4–6 hours. The recordings span two normal reading tasks and one task-specific reading task, resulting in a dataset that encompasses EEG and eye-tracking data of 21,629 words in 1107 sentences and 154,173 fixations. We believe that this dataset represents a valuable resource for natural language processing (NLP). The EEG and eye-tracking signals lend themselves to train improved machine-learning models for various tasks, in particular for information extraction tasks such as entity and relation extraction and sentiment analysis. Moreover, this dataset is useful for advancing research into the human reading and language understanding process at the level of brain activity and eye-movement.

Highlights

Background & SummaryNatural language processing (NLP), a fundamental aspect of artificial intelligence, aims at teaching computers to process features of natural language data, such as the sentiment of a sentence or relational information between text entities
To train a sentiment analysis system, which predicts the sentiment of a sentence, thousands of annotated sentences are needed
We aim to find and extract relevant aspects of text understanding and annotation directly from the source, i.e. eye-tracking and brain activity signals during reading

Summary

Background & Summary

Natural language processing (NLP), a fundamental aspect of artificial intelligence, aims at teaching computers to process features of natural language data, such as the sentiment of a sentence or relational information between text entities. In this work we focused more on the number of sentences recorded than the number of subjects While this dataset has been created with machine learning and natural language processing as its primary application, this data can be used to analyze the human reading process from a neuroscience perspective. It can be used for linguistic and (neuro-)psychological studies to generate new hypotheses (exploratory analyses), but these hypotheses should be tested on a higher number of subjects to account for the variability of reading strategies across subjects. The technical validation of this dataset, described further below, is proof of the quality of the recordings

Participants

Task Control question

Data Records Data privacy

Author Contributions

Findings

Additional Information