COCO-Search18 fixation dataset for predicting goal-directed attention control

Yupei Chen,Dimitris Samaras,Zhibo Yang,Gregory Zelinsky,Seoyoung Ahn,Minh Hoai

doi:10.1038/s41598-021-87715-9

Abstract

Attention control is a basic behavioral process that has been studied for decades. The currently best models of attention control are deep networks trained on free-viewing behavior to predict bottom-up attention control – saliency. We introduce COCO-Search18, the first dataset of laboratory-quality goal-directed behavior large enough to train deep-network models. We collected eye-movement behavior from 10 people searching for each of 18 target-object categories in 6202 natural-scene images, yielding sim 300,000 search fixations. We thoroughly characterize COCO-Search18, and benchmark it using three machine-learning methods: a ResNet50 object detector, a ResNet50 trained on fixation-density maps, and an inverse-reinforcement-learning model trained on behavioral search scanpaths. Models were also trained/tested on images transformed to approximate a foveated retina, a fundamental biological constraint. These models, each having a different reliance on behavioral training, collectively comprise the new state-of-the-art in predicting goal-directed search fixations. Our expectation is that future work using COCO-Search18 will far surpass these initial efforts, finding applications in domains ranging from human-computer interactive systems that can anticipate a person’s intent and render assistance to the potentially early identification of attention-related clinical disorders (ADHD, PTSD, phobia) based on deviation from neurotypical fixation behavior.

Highlights

Attention control is a basic behavioral process that has been studied for decades
Recent years taught us the importance of large datasets for model prediction, and this importance extends to models of attention control
COCO-Search[18] is currently the largest dataset of goal-directed search fixations, having sufficient number to be used as labels for training deep network models

Summary

Introduction

Attention control is a basic behavioral process that has been studied for decades. The currently best models of attention control are deep networks trained on free-viewing behavior to predict bottom-up attention control – saliency. The prediction of fixations during free viewing, the task-less cousin of visual search, has become an extremely active research topic, complete with managed competitions and leaderboards for the most predictive models[22] (http://saliency.mit.edu/). The best of these saliency models are all deep networks, and to our point, Scientific Reports | (2021) 11:8776. SALICON is a crowd-sourced dataset consisting of images that were annotated with mouse-based data approximating the attention shifts made during free viewing This model of fixation prediction during free viewing was trained on a form of freeviewing behavior. COCOSearch[18] was recently introduced at CVPR202028, and our aim in this paper is to elaborate on the richness of this dataset so as to increase its usefulness to researchers interesting in modeling top-down attention control

Objectives

Methods

Results

Conclusion