Abstract
Predicting human eye movements is a crucial task for understanding human behavior and has numerous applications in machine vision. Most current models for predicting eye movements are data-driven and require large datasets of recorded eye movements, which can be expensive and time-consuming to collect. In this paper, we present a novel theory-based model for predicting eye movements in a foveated visual system that maximizes information gain at each fixation. Our model uses a region-proposal network and eccentricity-based max pooling to account for the loss of detail in peripheral vision. We apply our model to predict human fixations in a visual search task for objects in real-world scenes. Unlike data-driven models, our model does not require training on large eye movement datasets and can generalize to any set of natural images and targets. We evaluate the generalization capability of our model by demonstrating its results on two publicly available visual search datasets, Ehinger and COCO-search18, without any further training on those datasets. Our model outperforms or performs comparably to data-driven models that are directly trained on human eye movement datasets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.