This paper presents and evaluates a system and method that record spatiotemporal scene information and location of the center of visual attention, i.e., spatiotemporal point of regard (PoR) in ecological environments. A primary research application of the proposed system and method is for enhancing current 2D visual attention models. Current eye-tracking approaches collapse a scene's depth structures to a 2D image, omitting visual cues that trigger important functions of the human visual system (e.g., accommodation and vergence). We combined head-mounted eye-tracking with a miniature time-of-flight camera to produce a system that could be used to estimate the spatiotemporal location of the PoR-the point of highest visual attention-within 3D scene layouts. Maintaining calibration accuracy is a primary challenge for gaze mapping; hence, we measured accuracy repeatedly by matching the PoR to fixated targets arranged within a range of working distances in depth. Accuracy was estimated as the deviation from estimated PoR relative to known locations of scene targets. We found that estimates of 3D PoR had an overall accuracy of approximately 2° omnidirectional mean average error (OMAE) with variation over a 1 h recording maintained within 3.6° OMAE. This method can be used to determine accommodation and vergence cues of the human visual system continuously within habitual environments, including everyday applications (e.g., use of hand-held devices).
Read full abstract