Abstract

Predicting human eye movements is a crucial task for understanding human behavior and has numerous applications in machine vision. Most current models for predicting eye movements are data-driven and require large datasets of recorded eye movements, which can be expensive and time-consuming to collect. In this paper, we present a novel theory-based model for predicting eye movements in a foveated visual system that maximizes information gain at each fixation. Our model uses a region-proposal network and eccentricity-based max pooling to account for the loss of detail in peripheral vision. We apply our model to predict human fixations in a visual search task for objects in real-world scenes. Unlike data-driven models, our model does not require training on large eye movement datasets and can generalize to any set of natural images and targets. We evaluate the generalization capability of our model by demonstrating its results on two publicly available visual search datasets, Ehinger and COCO-search18, without any further training on those datasets. Our model outperforms or performs comparably to data-driven models that are directly trained on human eye movement datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call