Abstract
Processing visual stimuli in a scene is essential for the human brain to make situation-aware decisions. These stimuli, which are prevalent subjects of diagnostic eye tracking studies, are commonly encoded as rectangular areas of interest (AOIs) per frame. Because it is a tedious manual annotation task, the automatic detection and annotation of visual attention to AOIs can accelerate and objectify eye tracking research, in particular for mobile eye tracking with egocentric video feeds. In this work, we implement two methods to automatically detect visual attention to AOIs using pre-trained deep learning models for image classification and object detection. Furthermore, we develop an evaluation framework based on the VISUS dataset and well-known performance metrics from the field of activity recognition. We systematically evaluate our methods within this framework, discuss potentials and limitations, and propose ways to improve the performance of future automatic visual attention detection methods.
Highlights
Eye tracking studies in many fields use Areas of Interest (AOIs) and visual attention to these areas of interest (AOIs) as a common analytical helper tool
We systematically evaluate our methods within this framework, discuss potentials and limitations, and propose ways to improve the performance of future automatic visual attention detection methods
We reviewed the metrics proposed in closely related works, but none of them was fully satisfactory: Panetta et al [12] compared their system to manual ground truth annotations by calculating the distance between two histograms that aggregate the duration of fixations from predicted or ground truth AOI regions, respectively
Summary
Eye tracking studies in many fields use Areas of Interest (AOIs) and visual attention to these AOIs as a common analytical helper tool. AOIs are very important, but incorrect placement of AOIs, and inaccurate or imprecise mapping of gaze events to AOIs can heavily undermine the validity of a research study [1]. This adds the requirement of high robustness, accuracy, and precision for gaze estimation and gaze to AOI mapping methods. The complexity increases if the stimulus is a video with dynamic AOIs, for example if they are linked to a dynamically moving object in the video In this case, an AOI must be annotated for each video frame. Annotations can be reused if the video is the same for all participants: the participants’ individual gaze or fixation points can be mapped to these AOI regions automatically. We aim at circumventing the requirement to instrument the environment with obtrusive markers
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.